Convexity and robustness of the Rényi entropy

Buryak, Filipp; Mishura, Yuliya

doi:10.15559/21-VMSTA185

Modern Stochastics: Theory and Applications

Convexity and robustness of the Rényi entropy

Volume 8, Issue 3 (2021), pp. 387–412

Filipp Buryak

Yuliya Mishura

https://doi.org/10.15559/21-VMSTA185

Pub. online: 26 July 2021 Type: Research Article

Open Access

Received
27 May 2021

Revised
23 June 2021

Accepted
24 June 2021

Published
26 July 2021

Abstract

We study convexity properties of the Rényi entropy as function of $\alpha >0$ on finite alphabets. We also describe robustness of the Rényi entropy on finite alphabets, and it turns out that the rate of respective convergence depends on initial alphabet. We establish convergence of the disturbed entropy when the initial distribution is uniform but the number of events increases to ∞ and prove that the limit of Rényi entropy of the binomial distribution is equal to Rényi entropy of the Poisson distribution.

1 Introduction

Let $(\Omega ,\mathfrak{F},\mathbf{P})$ be a probability space supporting all distributions considered below. For any $N\ge 1$ introduce the family of discrete distributions $p=({p_{1}},{p_{2}},\dots ,{p_{N}})$ with probabilities

\[ {p_{i}}\ge 0,1\le i\le N,N\ge 1,\hspace{0.2778em}{p_{1}}+\cdots +{p_{N}}=1.\]

In the present paper we investigate some properties of the Rényi entropy, which was proposed by Rényi in [14],

\[ {\mathcal{H}_{\alpha }}(p)=\frac{1}{1-\alpha }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right),\hspace{0.2778em}\alpha >0,\hspace{0.2778em}\alpha \ne 1,\]

including its limit value as $\alpha \to 1$, i.e., the Shannon entropy

\[ \mathcal{H}(p)=-{\sum \limits_{k=1}^{N}}{p_{k}}\log ({p_{k}}).\]

Due to this continuity, it is possible to put ${\mathcal{H}_{1}}(p)=\mathcal{H}(p)$. We consider the Rényi entropy as a functional of various parameters. The first approach is to fix the distribution and consider ${\mathcal{H}_{\alpha }}(p)$ as the function of $\alpha >0$. Some of the properties of ${\mathcal{H}_{\alpha }}(p)$ as the function of $\alpha >0$ are well known. In particular, it is known that ${\mathcal{H}_{\alpha }}(p)$ is continuous and nonincreasing in $\alpha \in (0,\infty )$, ${\lim \nolimits_{\alpha \to 0+}}{\mathcal{H}_{\alpha }}(p)=\log m$, where m is the number of nonzero probabilities, and ${\lim \nolimits_{\alpha \to +\infty }}{\mathcal{H}_{\alpha }}(p)=-\log {\max _{k}}{p_{k}}$. However, for the reader’s convenience, we provide the short proofs of this and some other simple statements in the Appendix. One can see that these properties of the entropy itself and its first derivative are common for all finite distributions. Also, it is known that Rényi entropy is Schur concave as a function of a discrete distribution, that is

\[ ({p_{i}}-{p_{j}})\left(\frac{\partial {\mathcal{H}_{\alpha }}(p)}{\partial {p_{i}}}-\frac{\partial {\mathcal{H}_{\alpha }}(p)}{\partial {p_{j}}}\right)\le 0,\hspace{0.2778em}i\ne j.\]

Some additional results such as lower bounds on the difference in the Rényi entropy for distributions defined on countable alphabets could be found in [12]. Those results usually use the Rényi divergence of order α of a distribution P from a distribution Q

\[ {D_{\alpha }}\left(P||Q\right)=\frac{1}{\alpha -1}\log \left({\sum \limits_{i=1}^{N}}\frac{{p_{i}^{\alpha }}}{{q_{i}^{\alpha -1}}}\right),\]

which is very similar to the Kullback–Leibler divergence. Some most important properties of the Rényi divergences were reviewed and extended in [7]. Rényi divergences for most commonly used univariate continuous distributions could be found in [8]. The Rényi entropy and divergence is widely used in majorization theory [6, 15], statistics [4, 13], information theory [12, 7, 1] and many other fields. Boundedness of the Rényi entropy was shown in [5] for discrete log-concave distributions depending on its variance. There are other operational definitions of the Rényi entropy given in [11], which are used in practice. It is also used in analysis of financial time series. As it is stated in [16], the Rényi entropy can deal effectively with heavy-tailed distributions and reflect a short-range characteristics of financial time series. So, in some sense, entropy can be connected to the long- and short-range dependence of the selected stochastic processes and allows to compare the memory inherent to various processes. Other methods of comparing the memory of the processes are proposed, e.g., in [3] and [2]. What is more, the Rényi entropy is used in physics. For it to be physically meaningful as thermostatistical quantity it should not change drastically if the probability distribution is slightly changed. It is important that experimental uncertainty in determining the distribution function not cause entropy to diverge. In [9] it is shown that the Rényi entropy is uniformly continuous for probabilities on finite sets. In our paper we go on and find the rate of convergence. In the present paper we restrict ourselves to the standard Rényi entropy and go a step ahead in comparison of standard properties, namely, we investigate convexity of the Rényi entropy with the help of the second derivative. It turned out that from this point of view, the situation is much more interesting and uncertain in comparison with the behavior of the first derivative, and crucially depends on the distribution. One might say that all the standard guesses are wrong. Of course, the second derivative is continuous (evidently, it simply means that it is continuous at 1 because at all other points the continuity is obvious), but then the surprises begin. If the second derivative starts with a positive value at zero, it can either remain positive or have inflection points, depending on the distribution. If it starts from the negative value, it can have the first infection point both before 1 and after 1, depending on the distribution, too (point 1 is interesting as some crucial point for entropy, so, we compare the value of inflection points with it). The value of the second derivative at zero is bounded from below but unbounded from above. Due to the over-complexity of some expressions, which defied analytical consideration, we propose several illustrations performed by numerical methods. We investigate robustness of the Rényi entropy w.r.t. the distribution, and it turns out that the rate of respective convergence depends on initial distribution, too. Further, we establish convergence of the disturbed entropy when the initial distribution is uniform but the number of events increases to ∞, and prove that the limit of Rényi entropy of the binomial distribution is equal to entropy of the Poisson distribution. It was previously proved in [10] that the Shannon entropy of the binomial distribution is increasing to entropy of the Poisson distribution. Our proof of this particular fact is simpler because uses only Lebesgue’s dominated convergence theorem. The paper is organized as follows. Section 2 is devoted to the convexity properties of the Rényi entropy, Section 3 describes robustness of the Rényi entropy, and Section A contains some auxiliary results.

2 Convexity of the Rényi entropy

To start, we consider the general properties of the 2nd derivative of the Rényi entropy.

2.1 The form and the continuity of the 2nd derivative

Let us denote ${S_{i}}(\alpha )={\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}{\log ^{i}}{p_{k}}$, $i=0,1,2,3$. Denote also $f(\alpha )=\log \big({\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}\big)$. Obviously, the function $f\in {C^{\infty }}({\mathbb{R}^{+}})$, and its first three derivatives equal

\[\begin{array}{l}\displaystyle {f^{\prime }}(\alpha )=\frac{{S_{1}}(\alpha )}{{S_{0}}(\alpha )},\hspace{0.2778em}{f^{\prime\prime }}(\alpha )=\frac{{S_{2}}(\alpha ){S_{0}}(\alpha )-{S_{1}^{2}}(\alpha )}{{S_{0}^{2}}(\alpha )},\\ {} \displaystyle {f^{\prime\prime\prime }}(\alpha )=\frac{{S_{3}}(\alpha ){S_{0}^{2}}(\alpha )-3{S_{2}}(\alpha ){S_{1}}(\alpha ){S_{0}}(\alpha )+2{S_{1}^{3}}(\alpha )}{{S_{0}^{3}}(\alpha )}.\end{array}\]

In particular, if one considers the random variable ξ taking values $\log {p_{k}}$ with probability ${p_{k}}$, then

(1)

\[ \begin{array}{c}\displaystyle {f^{\prime }}(1)=E(\xi )<0,\hspace{0.2778em}{f^{\prime\prime }}(1)=E({\xi ^{2}})-{(E(\xi ))^{2}}>0,\\ {} \displaystyle {f^{\prime\prime\prime }}(1)=E({\xi ^{3}})-3E({\xi ^{2}})E(\xi )+2{(E(\xi ))^{3}},\end{array}\]

and the sign of ${f^{\prime\prime\prime }}(1)$ is not fixed (as we can see below, it can be both + and −).

Lemma 1.

Let ${p_{k}}\ne 0$ for all $1\le k\le N$. Then

(i) (a) The 2nd derivative ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ equals

(2)
\[ {\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)=-\frac{1}{{(1-\alpha )^{3}}}\left({\sum \limits_{k=1}^{N}}\left((1-\alpha ){q^{\prime }_{k}}(\alpha )+2{q_{k}}(\alpha )\right)\log \frac{{q_{k}}(\alpha )}{{p_{k}}}\right),\]
where
\[ {q_{k}}(\alpha )=\frac{{p_{k}^{\alpha }}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}}.\]

(b) The 2nd derivative ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ can be also presented as

(3)
\[ {\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)=-\frac{1}{3}{f^{\prime\prime\prime }}(\theta )\]
for some $0<\theta <\alpha $.
(ii) The 2nd derivative ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ is continuous on ${\mathbb{R}^{+}}$ if we put
\[ {\mathcal{H}_{1}^{{^{\prime\prime }}}}(p)=-\frac{1}{3}{f^{\prime\prime\prime }}(1)=-\frac{1}{3}(E({\xi ^{3}})-3E({\xi ^{2}})E(\xi )+2{(E(\xi ))^{3}}).\]

Proof.

Equality (2) is a result of direct calculations. Concerning equality (3), we can present ${\mathcal{H}_{\alpha }}(p)$ as

\[ {\mathcal{H}_{\alpha }}(p)=\frac{f(\alpha )-f(1)}{1-\alpha },\]

therefore, $-{\mathcal{H}_{\alpha }}(p)$ is a slope function for f. Taking successive derivatives, we get from the standard Taylor formula that

\[ {\mathcal{H}_{\alpha }^{{^{\prime }}}}(p)=\frac{{f^{\prime }}(\alpha )(1-\alpha )+f(\alpha )}{{(1-\alpha )^{2}}}=-\frac{1}{2}{f^{{^{\prime\prime }}}}(\eta ),\]

and

\[ {\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)=\frac{{f^{\prime\prime }}(\alpha ){(1-\alpha )^{2}}+2{f^{\prime }}(\alpha )}{(1-\alpha )+2f(\alpha )}=-\frac{1}{3}{f^{{^{\prime\prime\prime }}}}(\theta ),\]

where $\eta ,\theta \in (1,\alpha )$. If $\alpha \to 1$, then both η and θ tend to 1. Taking into account (1), we immediately get both equality (3) and statement $(ii)$. □

2.2 Behavior of the 2nd derivative at the origin

Let us consider the starting point for the 2nd derivative, i.e., the behavior of ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ at zero as a function of a distribution vector p. Analyzing (2), we see that ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ as function of α is continuous in 0. Moreover,

\[ {q_{k}}(0)=1/N,\hspace{0.2778em}{q^{\prime }_{k}}(0)=\frac{\log {p_{k}}}{N}-\frac{{\textstyle\textstyle\sum _{k=1}^{N}}\log {p_{k}}}{{N^{2}}},\]

so we can present ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ as

\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =-{\sum \limits_{k=1}^{N}}\left(\frac{1}{N}\log {p_{k}}-\frac{1}{{N^{2}}}{\sum \limits_{i=1}^{N}}\log {p_{i}}+\frac{2}{N}\right)\log \frac{1}{N{p_{k}}}\\ {} & ={\sum \limits_{k=1}^{N}}\left(\frac{1}{N}\log {p_{k}}-\frac{1}{{N^{2}}}{\sum \limits_{i=1}^{N}}\log {p_{i}}+\frac{2}{N}\right)\left(\log N+\log {p_{k}}\right)\\ {} & =2\log N+\frac{1}{N}{\sum \limits_{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\sum \limits_{k=1}^{N}}\log {p_{k}}.\end{aligned}\]

Now we are interested in the sign of ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$. It is very simple to give an example of a distribution for which ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)>0$. One of such examples is given in Figure 1. Negative ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ is also possible, however, at this moment we prefer to start with a more general result.

Lemma 2.

If some probability vector p is a point of local extremum of ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$ then either $p=p(uniform)=\left(\frac{1}{N},\dots ,\frac{1}{N}\right)$ or it contains two different probabilities.

Proof.

Let us formulate the necessary conditions for ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ to have a local extremum at some point. Taking into account the limitation ${\textstyle\sum _{k=1}^{N}}{p_{k}}=1$, these conditions have the form

\[ \left\{\begin{array}{l}2\log N+\frac{1}{N}{\textstyle\textstyle\sum _{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\textstyle\textstyle\sum _{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\textstyle\textstyle\sum _{k=1}^{N}}\log {p_{k}}\longrightarrow extr\hspace{1em}\\ {} {\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}}=1.\hspace{1em}\end{array}\right.\]

We write the Lagrangian function

\[\begin{array}{c}\displaystyle L={\lambda _{0}}\left(2\log N+\frac{1}{N}{\sum \limits_{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\sum \limits_{k=1}^{N}}\log {p_{k}}\right)\\ {} \displaystyle +\lambda \left({\sum \limits_{k=1}^{N}}{p_{k}}-1\right).\end{array}\]

If some p is an extremum point then there exist ${\lambda _{0}}$ and λ such that ${\lambda _{0}^{2}}+{\lambda ^{2}}\ne 0$ and $\frac{\partial L}{\partial {p_{i}}}(p)=0$ for all $1\le i\le N$, i.e.,

\[ \frac{\partial L}{\partial {p_{i}}}={\lambda _{0}}\left(\frac{2}{N{p_{i}}}\log {p_{i}}-\frac{2}{{N^{2}}{p_{i}}}\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)+\frac{2}{N{p_{i}}}\right)+\lambda =0.\]

If ${\lambda _{0}}=0$ then $\lambda =0$. However, ${\lambda _{0}^{2}}+{\lambda ^{2}}\ne 0$, therefore we can put ${\lambda _{0}}=1$. Then

\[ -\lambda {p_{i}}=\frac{2}{N}\log {p_{i}}-\frac{2}{{N^{2}}}\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)+\frac{2}{N}.\]

Taking the sum of these equalities we get that $\lambda =-2$, whence

(4)

\[ {p_{i}}-\frac{1}{N}\log {p_{i}}=\frac{1}{N}-\frac{1}{{N^{2}}}\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right).\]

So, if the distribution vector p is an extremum point then ${p_{1}}-\frac{1}{N}\log {p_{1}}=\cdots ={p_{N}}-\frac{1}{N}\log {p_{N}}$. Let us take a look at the continuous function $f(x)=x-\frac{1}{N}\log x$, $x\in (0,1)$. Its derivative equals

\[\begin{array}{l}\displaystyle {f^{\prime }}(x)=1-\frac{1}{Nx}=0\Leftrightarrow x=\frac{1}{N},\hspace{0.2778em}\operatorname{sign}({f^{\prime }}(x))=\operatorname{sign}\left(x-\frac{1}{N}\right),\\ {} \displaystyle \underset{x\to 0+}{\lim }f(x)=+\infty ,\hspace{0.2778em}\underset{x\to +1}{\lim }f(x)=1.\end{array}\]

So, $f(x)$ has its global minimum at point $x=\frac{1}{N}$, and for any $f(\frac{1}{N})<y\le 1$ there exist two points, ${x^{\prime }}\ne {x^{\prime\prime }}$, ${x^{\prime }},{x^{\prime\prime }}\in (0,1)$ such that $f({x^{\prime }})=f({x^{\prime\prime }})=y$. Thus, if ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ achieves local extremum at vector p, then it contains no more than two different probabilities. Obviously, it can be $p=p(uniform)=\left(\frac{1}{N},\dots ,\frac{1}{N}\right)$. □

Remark 1.

Note that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p(uniform))=0$. Therefore, in order to find the distribution for which ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$, let us consider the distribution vector that contains only two different probabilities ${p_{0}}$, ${q_{0}}$ such that:

(5)

\[ \left\{\begin{array}{l}{p_{0}}-{q_{0}}=\frac{1}{N}\left(\log {p_{0}}-\log {q_{0}}\right),\hspace{1em}\\ {} k{p_{0}}+(N-k){q_{0}}=1,\hspace{1em}\end{array}\right.\]

where $N,k\in \mathbb{N}$, $N>k$ and ${p_{0}},{q_{0}}\in (0,1)$.

Lemma 3.

Let p be the distribution vector satisfying (5). Then ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$.

Proof.

First, we will show that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is nonpositive. For that we rewrite ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ in terms of ${p_{0}}$ and ${q_{0}}$:

\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =2\log N+\frac{1}{N}\left(k{(\log {p_{0}})^{2}}+(N-k){(\log {q_{0}})^{2}}\right)\\ {} & -\frac{1}{{N^{2}}}{\left(k\log {p_{0}}+(N-k)\log {q_{0}}\right)^{2}}+\frac{2}{N}\left(k\log {p_{0}}+(N-k)\log {q_{0}}\right)\\ {} & =2\log N+\frac{k(N-k)}{{N^{2}}}\left({(\log {p_{0}})^{2}}-2\log {p_{0}}\log {q_{0}}+{(\log {q_{0}})^{2}}\right)\\ {} & +\frac{2k}{N}\left(\log {p_{0}}-\log {q_{0}}\right)+2\log {q_{0}}\\ {} & =2\log N{q_{0}}+k(N-k){({p_{0}}-{q_{0}})^{2}}+2k({p_{0}}-{q_{0}}).\end{aligned}\]

We know that $k{p_{0}}+(N-k){q_{0}}=1$, whence $k=\frac{N{q_{0}}-1}{{q_{0}}-{p_{0}}}$, and $N-k=\frac{1-N{p_{0}}}{{q_{0}}-{p_{0}}}$. Then

\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =2\log N{q_{0}}+(1-N{q_{0}})(N{p_{0}}-1)+2(1-N{q_{0}})\\ {} & =2\log N{q_{0}}+N({p_{0}}-{q_{0}})+1-{N^{2}}{p_{0}}{q_{0}}\\ {} & =\log {(N{q_{0}})^{2}}+\log \frac{{p_{0}}}{{q_{0}}}+1-{N^{2}}{p_{0}}{q_{0}}=\log {N^{2}}{p_{0}}{q_{0}}-{N^{2}}{p_{0}}{q_{0}}+1.\end{aligned}\]

Note that $\log x-x+1<0$ for $x>0$, $x\ne 1$. We want to show that under conditions (5) ${N^{2}}{p_{0}}{q_{0}}$ cannot be equal to 1. Suppose that ${N^{2}}{p_{0}}{q_{0}}=1$. Then it follows from (5) that

\[ \frac{k}{{N^{2}}}+(N-k){q_{0}^{2}}={q_{0}}.\]

It means that ${q_{0}}$ and ${p_{0}}$ are algebraic numbers. Thus, their defference ${p_{0}}-{q_{0}}$ is also algebraic. On the other hand, by the Lindemann–Weierstrass theorem $\frac{1}{N}(\log {p_{0}}-\log {q_{0}})$ is transcendental number, which contradicts (5). So ${N^{2}}{p_{0}}{q_{0}}\ne 1$ and ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$. □

Theorem 1.

For any $n>2$ there exists $N\ge n$ and a probability vector $p=({p_{1}},\dots ,{p_{N}})$ such that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$.

Proof.

Consider a distribution vector p that satisfies conditions (5). From Lemma 3 we know that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)<0$. Now we want to show that there exist an arbitrarily large $N\in \mathbb{N}$ and a distribution vector p of length N that satisfy those conditions. For that we denote

\[ x=N{p_{0}},\hspace{0.2778em}y=N{q_{0}},\hspace{0.2778em}r=\frac{k}{N}=\frac{y-1}{y-x}.\]

Then $0<x<1<y$ and $r<1$ and $x-y=\log x-\log y$. The function $x-\log x$ is decreasing on $(0,1)$, is increasing on $(1,+\infty )$ and is equal to 1 at point 1. Let $y=y(x)$ be the implicit function defined by $x-y=\log x-\log y$. By that we get the 1-to-1 correspondence from $x\in (0,1)$ to $y\in (1,+\infty )$. We also have fuction the $r(x)=\frac{y(x)-1}{y(x)-x}$. If we find ${x^{\prime }}\in (0,1)$ such that ${r^{\prime }}=r({x^{\prime }})$ is rational then we could pick $N,k\in \mathbb{N}$ such that ${r^{\prime }}=\frac{k}{N}$ and get q distribution vector p satisfying (5) with ${p_{0}}=\frac{x}{N}$, ${q_{0}}=\frac{y}{N}$. However, we will not look for such ${x^{\prime }}$, bot will just show that they exist. To do that, observe that $y(x)$ is a continuous function of x and so is the function $r(x)=\frac{y(x)-1}{y(x)-x}$. What is more,

\[ y(x)\to +\infty ,\hspace{0.2778em}x\to 0+\hspace{0.2778em}\mathrm{so}\hspace{0.2778em}r(x)\to 1,\hspace{0.2778em}x\to 0+.\]

Let us fix ${x_{0}}\in (0,1)$, $r({x_{0}})<1$. Then for any ${r^{\prime }}\in (r({x_{0}}),1)$ there exists ${x^{\prime }}\in (0,{x_{0}})$ such that $r({x^{\prime }})={r^{\prime }}$. By taking ${r^{\prime }}\in \mathbb{Q}$ we get that there exists ${x^{\prime }}$ such that $\frac{k}{N}<1$ and is rational. Finally, we want to show that N can be arbitrarily large. For that simply observe that $\frac{k}{N}={r^{\prime }}$ so as ${r^{\prime }}\to 1-$ we get that $N\to +\infty $. □

Lemma 4.

Let N be fixed. Then ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ as the function of vector p is bounded from below and is unbounded from above.

Proof.

Recall that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)=0$ on the uniform distribution and exclude this case from the further consideration. In order to simplify the notations, we denote ${x_{k}}=\log {p_{k}}$, and let

\[ {S_{N}}:=N({\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)-2\log N)={\sum \limits_{k=1}^{N}}{({x_{k}})^{2}}-\frac{1}{N}{\left({\sum \limits_{k=1}^{N}}{x_{k}}\right)^{2}}+2{\sum \limits_{k=1}^{N}}{x_{k}}.\]

Note that there exists $n\le N-1$ such that

\[ {x_{1}}<\log \frac{1}{N},\dots ,{x_{n}}<\log \frac{1}{N},\hspace{0.2778em}{x_{n+1}}\ge \log \frac{1}{N},\dots ,{x_{N}}\ge \log \frac{1}{N}.\]

Further, denote the rectangle $A={[\log \frac{1}{N};0]^{N-n}}\subset {\mathbb{R}^{N-n}}$, and let

\[ {S_{N,1}}={\sum \limits_{k=1}^{n}}{x_{k}},\hspace{0.2778em}{S_{N,2}}={\sum \limits_{k=n+1}^{N}}{x_{k}}.\]

Let us establish that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is bounded from below. In this connection, rewrite ${S_{N}}$ as

\[ {S_{N}}={\sum \limits_{k=1}^{n}}{x_{k}^{2}}+{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}-\frac{1}{N}\left({({S_{N,1}})^{2}}+2{S_{N,1}}{S_{N,2}}+{({S_{N,2}})^{2}}\right)+2{S_{N,1}}+2{S_{N,2}}.\]

By the Cauchy–Schwarz inequality we have

\[ {\left({\sum \limits_{k=1}^{n}}{x_{k}}\right)^{2}}\le n{\sum \limits_{k=1}^{n}}{x_{k}^{2}},\hspace{0.2778em}{\left({\sum \limits_{k=n+1}^{N}}{x_{k}}\right)^{2}}\le (N-n){\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}.\]

Therefore

\[\begin{aligned}{}{S_{N}}& \ge \left(1-\frac{n}{N}\right){\sum \limits_{k=1}^{n}}{x_{k}^{2}}+\frac{n}{N}{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}-\frac{2}{N}{S_{N,1}}{S_{N,2}}+2{S_{N,1}}+2{S_{N,2}}\\ {} & ={\sum \limits_{k=1}^{n}}\left(\left(1-\frac{n}{N}\right){x_{k}^{2}}+{x_{k}}\left(2-\frac{2}{N}{S_{N,2}}\right)\right)+\frac{n}{N}{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}+2{S_{N,2}}\\ {} & =\frac{1}{N}{\sum \limits_{k=1}^{n}}\left(\left(N-n\right){x_{k}^{2}}+2{x_{k}}\left(N-{S_{N,2}}\right)\right)+\frac{n}{N}{\sum \limits_{k=n+1}^{N}}{x_{k}^{2}}+2{S_{N,2}}.\end{aligned}\]

There exists $M>0$ such that for every $n\le N-1$ we have $|{S_{N,2}}|\le M$ because A is compact and ${S_{N,2}}$ is continuous on A. Obviously, $\frac{n}{N}{\textstyle\sum _{k=n+1}^{N}}{x_{k}^{2}}\ge 0$. Finally, for every $1\le k\le n$ we have that $\left(N-n\right){x_{k}^{2}}+2{x_{k}}\left(N-{S_{2}}\right)$ is bounded from below by the value $-\frac{{(N-{S_{2,N}})^{2}}}{N-n}\ge -{N^{2}}-{M^{2}}$. Resuming, we get that ${S_{N}}$ is bounded from below, and consequently ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is bounded from below for fixed N.

Now we want to establish that ${\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)$ is not bounded from above. In this connection, let $\varepsilon >0$, and let us consider the distribution of the form ${p_{1}}=\varepsilon $, ${p_{2}}=\cdots ={p_{N}}=\frac{1-\varepsilon }{N-1}$. Then we have

\[\begin{aligned}{}{\mathcal{H}_{0}^{{^{\prime\prime }}}}(p)& =2\log N+\frac{1}{N}{\sum \limits_{k=1}^{N}}{(\log {p_{k}})^{2}}-\frac{1}{{N^{2}}}{\left({\sum \limits_{k=1}^{N}}\log {p_{k}}\right)^{2}}+\frac{2}{N}{\sum \limits_{k=1}^{N}}\log {p_{k}}\\ {} & =2\log N+\frac{N-1}{N}{\left(\log \frac{1-\varepsilon }{N-1}\right)^{2}}+\frac{1}{N}{(\log \varepsilon )^{2}}\\ {} & -\frac{1}{{N^{2}}}{\left((N-1)\log \frac{1-\varepsilon }{N-1}+\log \varepsilon \right)^{2}}+\frac{2(N-1)}{N}\log \frac{1-\varepsilon }{N-1}+\frac{2}{N}\log \varepsilon \\ {} & =\left(\frac{1}{N}-\frac{1}{{N^{2}}}\right){\left(\log \varepsilon \right)^{2}}+\left(\frac{2}{N}-\frac{2(N-1)}{{N^{2}}}\log \frac{1-\varepsilon }{N-1}\right)\log \varepsilon +2\log N\\ {} & +\left(\frac{N-1}{N}-\frac{{(N-1)^{2}}}{{N^{2}}}\right){\left(\log \frac{1-\varepsilon }{N-1}\right)^{2}}\\ {} & +\frac{2(N-1)}{N}\log \frac{1-\varepsilon }{N-1}\to +\infty ,\hspace{0.2778em}\varepsilon \to 0+.\end{aligned}\]

□

2.3 Superposition of entropy that is convex

Now we establish that the superposition of entropy with some decreasing function is convex. Namely, we shall consider the function

(6)

\[ {\mathcal{G}_{\beta }}(p)=-{\mathcal{H}_{1+\frac{1}{\beta }}}(p)=\beta \log \left({\sum \limits_{k=1}^{N}}{p_{k}^{1+1/\beta }}\right),\hspace{0.1667em}\beta >0,\]

and prove its convexity. Because now we consider the tools that do not include differentiation, we can assume that some probabilities are zero. In order to prove the convexity, we start with the following simple and known result whose proof is added for the reader’s convenience.

Lemma 5.

For any measure space $(\mathcal{X},\Sigma ,\mu )$ and any measurable $f\in {L^{p}}(\mathcal{X},\Sigma ,\mu )$ for some interval $p\in [a,b]$, ${\left\| f\right\| _{p}}={\left\| f\right\| _{{L^{p}}(\mathcal{X},\Sigma ,\mu )}}$ is log-convex as a function of $1/p$ on this interval.

Proof.

For any ${p_{2}},{p_{1}}>0$ and $\theta \in (0,1)$, denote $p={\big(\theta /{p_{1}}+(1-\theta )/{p_{2}}\big)^{-1}}$ and observe that

\[ \theta p/{p_{1}}+(1-\theta )p/{p_{2}}=1.\]

Therefore, by the Hölder inequality

\[\begin{aligned}{}{\left\| f\right\| _{p}^{p}}& ={\int _{\mathcal{X}}}|f(x){|^{\theta p}}\cdot |f(x){|^{(1-\theta )p}}\mu (dx)\\ {} & \le {\left({\int _{\mathcal{X}}}|f(x){|^{{p_{1}}}}\mu (dx)\right)^{\theta p/{p_{1}}}}{\left({\int _{\mathcal{X}}}|f(x){|^{{p_{2}}}}\mu (dx)\right)^{(1-\theta )p/{p_{2}}}},\end{aligned}\]

whence

\[ \log {\left\| f\right\| _{p}}\le \theta \log {\left\| f\right\| _{{p_{1}}}}+(1-\theta )\log {\left\| f\right\| _{{p_{2}}}},\]

as required. □

Corollary 1.

For any probability vector $p=({p_{k}},1\le k\le N)$, the function ${\mathcal{G}_{\beta }}(p),\beta >0$, is convex.

Proof.

It follows from Lemma 5 by setting $\mathcal{X}=\left\{1,\dots ,N\right\}$, $\mu (A)={\textstyle\sum _{k\in A}}{p_{k}}$, $f(k)={p_{k}}$, $k\in \mathcal{X}$. □

Remark 2.

It follows immediately from (6) that for the function

\[ {\mathcal{G}_{\beta }}(p)=\beta \log {\sum \limits_{k=1}^{N}}{p_{k}^{1+1/\beta }},\hspace{0.2778em}\beta >0,\]

${\mathcal{H}_{\alpha }}(p)={\mathcal{G}_{\frac{1}{\alpha -1}}}(p)$. For $\alpha >1$, $\frac{1}{\alpha -1}$ is convex. If there be such p that ${G_{\cdot }}(p)$ be nondecreasing on an interval, then ${\mathcal{G}_{\frac{1}{\alpha -1}}}(p)$ be convex on that interval and ${\mathcal{H}_{\alpha }}(p)$ be convex, too. However,

\[\begin{array}{c}\displaystyle {\mathcal{G}^{\prime }_{\beta }}(p)=\log {\sum \limits_{k=1}^{N}}{p_{k}^{1+1/\beta }}-\frac{1}{\beta }\frac{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}\log {p_{k}}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}}\\ {} \displaystyle =-{\sum \limits_{k=1}^{N}}\frac{{p_{k}^{1+1/\beta }}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}}\log \frac{{p_{k}^{1/\beta }}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{1+1/\beta }}}\le 0.\end{array}\]

In some sense, this is a reason why we cannot say something definite concerning the 2nd derivative of entropy either on the whole semiaxes or even in the interval $[1,+\infty )$.

2.4 Graphs of ${\mathcal{H}_{\alpha }}(p)$ and its second derivative for several probability distributions

Fig. 1.

Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}={p_{2}}=0.4$, ${p_{3}}=0.2$. Here ${\mathcal{H}_{\alpha }}(p)$ is convex as a function of $\alpha >0$

Fig. 2.

Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}=\cdots ={p_{198}}=\frac{1}{400}$, ${p_{199}}={p_{200}}=\frac{101}{400}$. Dot is the point where ${H_{\alpha }^{{^{\prime\prime }}}}(p)=0$ and this point is $\alpha =0.99422$

Fig. 3.

Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}=\cdots ={p_{10}}=0.01$, ${p_{11}}={p_{12}}=0.15$, ${p_{13}}={p_{14}}=0.3$. Here the second derivative becomes positive long before point 1 (at point 0.11318)

Fig. 4.

Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}=\cdots ={p_{10}}=0.08$, ${p_{11}}=0.2$. Here the second derivative becomes positive after point 1 (at point 2.9997)

Fig. 5.

Graphs of ${\mathcal{H}_{\alpha }}(p)$ and ${\mathcal{H}_{\alpha }^{{^{\prime\prime }}}}(p)$, where ${p_{1}}=\cdots ={p_{100}}=0.0001$, ${p_{101}}=\cdots ={p_{200}}=0.0079$, ${p_{201}}=0.2$. Here the second derivative has two zeros

Fig. 6.

Graph of ${\mathcal{H}_{\alpha }}(p)$, where ${p_{1}}={p_{2}}=0.05,{p_{3}}$ is changing from 0 to 0.9 and ${p_{4}}=1-{p_{1}}-{p_{2}}-{p_{3}}$

3 Robustness of the Rényi entropy

Now we study the asymptotic behavior of the Rényi entropy depending on the behavior of the involved probabilities. The first problem is the stability of the entropy w.r.t. involved probabilities and the rate of its convergence to the limit value when probabilities tend to their limit value with the fixed rate.

3.1 Rate of convergence of the disturbed entropy when the initial distribution is arbitrary but fixed

Let us look at distributions that are “near” some fixed distribution $p=({p_{k}},\hspace{0.2778em}1\le k\le N)$ and construct the approximate distribution $p(\epsilon )=({p_{k}}(\epsilon ),\hspace{0.2778em}1\le k\le N)$ as follows. We can assume that some probabilities are zero, and we shall see that this assumption influences the rate of convergence of the Rényi entropy to the limit value. So, let $0\le {N_{1}}<N$ be a number of zero probabilities, and for them we consider approximate values of the form ${p_{k}}(\epsilon )={c_{k}}\varepsilon $, $0\le {c_{k}}\le 1$, $1\le k\le {N_{1}}$. Further, let ${N_{2}}=N-{N_{1}}\ge 1$ be a number of nonzero probabilities, and for them we consider approximate values of the form ${p_{k}}(\epsilon )={p_{k}}+{c_{k}}\varepsilon $, $|{c_{k}}|\le 1$, ${N_{1}}+1\le k\le N$, where ${c_{1}}+\cdots +{c_{N}}=0$ and $\varepsilon \le \underset{{N_{1}}+1\le k\le N}{\min }{p_{k}}$. Assume also that there exists $k\le N$ such that ${c_{k}}\ne 0$, otherwise ${\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))=0$. So, we disturb the intial probabilities linearly in ε with different weights whose sum should necessarily be zero. These assumptions imply that $0\le {p_{k}}(\epsilon )\le 1$ and ${p_{1}}(\epsilon )+\cdots +{p_{N}}(\epsilon )=1$. Now we want to find out how entropy of the disturbed distribution will differ from the initial entropy, depending on parameters ε, N and α. We start with $\alpha =1$.

Theorem 2.

Let the number N and coefficients ${c_{1}},\dots ,{c_{N}}$ be fixed, and let $\alpha =1$. We have three different situations:

$(i)$ Let ${N_{1}}\ge 1$ and there exists $k\le {N_{1}}$ such that ${c_{k}}\ne 0$. Then
\[ {\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))\sim \varepsilon \log \varepsilon {\sum \limits_{k=1}^{{N_{1}}}}{c_{k}},\hspace{0.2778em}\varepsilon \to 0.\]
$(ii)$ Let for all $k\le {N_{1}}\hspace{0.2778em}{c_{k}}=0$ and ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}\log {p_{k}}\ne 0$. Then
\[ {\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))\sim \varepsilon {\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}\log {p_{k}},\hspace{0.2778em}\varepsilon \to 0.\]
$(iii)$ Let for all $k\le {N_{1}}\hspace{0.2778em}{c_{k}}=0$ and ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}\log {p_{k}}=0$. Then
\[ {\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))\sim \frac{{\varepsilon ^{2}}}{2}{\sum \limits_{k={N_{1}}+1}^{N}}\frac{{c_{k}^{2}}}{{p_{k}}},\hspace{0.2778em}\varepsilon \to 0.\]

Proof.

First of all, we will find the asymptotic behavior of two auxiliary functions as $\varepsilon \to 0$. First, let $0\le {c_{k}}\le 1$. Then

\[ {c_{k}}\varepsilon \log ({c_{k}}\varepsilon )={c_{k}}\varepsilon \log \varepsilon +{c_{k}}\varepsilon \log {c_{k}}={c_{k}}\varepsilon \log \varepsilon +o(\varepsilon \log \varepsilon ),\hspace{0.2778em}\varepsilon \to 0.\]

Second, let ${p_{k}}>0$, $|{c_{k}}|\le 1$. Taking into account the Taylor expansion of logarithm

\[ \log (1+x)=x-\frac{{x^{2}}}{2}+o({x^{2}}),\hspace{0.2778em}x\to 0,\]

we can write:

(7)

\[\begin{array}{l}\displaystyle ({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}}={c_{k}}\varepsilon \log {p_{k}}+({p_{k}}+{c_{k}}\varepsilon )\log (1+{c_{k}}{p_{k}^{-1}}\varepsilon )\\ {} \displaystyle ={c_{k}}\varepsilon \log {p_{k}}+({p_{k}}+{c_{k}}\varepsilon )\left({c_{k}}{p_{k}^{-1}}\varepsilon -\frac{1}{2}{({c_{k}}{p_{k}^{-1}}\varepsilon )^{2}}+o({\varepsilon ^{2}})\right)\\ {} \displaystyle =\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+{\varepsilon ^{2}}\left({c_{k}^{2}}{p_{k}^{-1}}-\frac{1}{2}{c_{k}^{2}}{p_{k}^{-1}}\right)+o({\varepsilon ^{2}})\\ {} \displaystyle =\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+\frac{{c_{k}^{2}}{\varepsilon ^{2}}}{2{p_{k}}}+o({\varepsilon ^{2}}),\hspace{0.2778em}\varepsilon \to 0.\end{array}\]

In particular, we immediately get from (3.1) that

\[ ({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}}=o(\varepsilon \log \varepsilon ),\hspace{0.2778em}\varepsilon \to 0,\]

and

\[ ({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}}=\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+o(\varepsilon ),\hspace{0.2778em}\varepsilon \to 0\]

Now simply observe the following.

\[\begin{aligned}{}\hspace{-23.0pt}(i)\hspace{2em}\hspace{1em}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))}{\varepsilon \log \varepsilon }\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon \log \varepsilon }{\sum \limits_{k=1}^{{N_{1}}}}{c_{k}}\varepsilon \log {c_{k}}\varepsilon \\ {} & \hspace{1em}+\frac{1}{\varepsilon \log \varepsilon }{\sum \limits_{k={N_{1}}+1}^{N}}(({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}})\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon \log \varepsilon }\left({\sum \limits_{k=1}^{{N_{1}}}}({c_{k}}\varepsilon \log \varepsilon +o(\varepsilon \log \varepsilon ))+{\sum \limits_{k={N_{1}}+1}^{N}}o(\varepsilon \log \varepsilon )\right)\\ {} & \hspace{1em}={\sum \limits_{k=1}^{{N_{1}}}}{c_{k}}.\end{aligned}\]

$(ii)$ Since for any $k\le {N_{1}}$ we have that ${c_{k}}=0$ and the total sum ${c_{1}}+\cdots +{c_{N}}=0$ then ${c_{{N_{1}}+1}}+\cdots +{c_{N}}=0$. Furthermore, in this case

\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))}{\varepsilon }& =\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon }{\sum \limits_{k={N_{1}}+1}^{N}}(({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}})\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{\varepsilon }{\sum \limits_{k={N_{1}}+1}^{N}}(\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+o(\varepsilon ))\\ {} & ={\sum \limits_{k={N_{1}}+1}^{N}}({c_{k}}\log {p_{k}}+{c_{k}})={\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}\log {p_{k}}.\end{aligned}\]

$(iii)$ In this case we have the following relations:

\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{1}}(p)-{\mathcal{H}_{1}}(p(\epsilon ))}{{\varepsilon ^{2}}}& =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}{\sum \limits_{k={N_{1}}+1}^{N}}(({p_{k}}+{c_{k}}\varepsilon )\log ({p_{k}}+{c_{k}}\varepsilon )-{p_{k}}\log {p_{k}})\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}{\sum \limits_{k={N_{1}}+1}^{N}}(\varepsilon ({c_{k}}\log {p_{k}}+{c_{k}})+\frac{{c_{k}^{2}}{\varepsilon ^{2}}}{2{p_{k}}}+o({\varepsilon ^{2}}))\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}{\sum \limits_{k={N_{1}}+1}^{N}}\left(\frac{{c_{k}^{2}}{\varepsilon ^{2}}}{2{p_{k}}}+o({\varepsilon ^{2}})\right)=\frac{1}{2}{\sum \limits_{k={N_{1}}+1}^{N}}\frac{{c_{k}^{2}}}{{p_{k}}}.\end{aligned}\]

Theorem is proved. □

Now we proceed with $\alpha <1$.

Theorem 3.

Let the number N and coefficients ${c_{1}},\dots ,{c_{N}}$ be fixed, and let $\alpha <1$. Then we have three different situations:

$(i)$ Let ${N_{1}}\ge 1$ and there exists $k\le {N_{1}}$ such that ${c_{k}}\ne 0$. Then
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim \frac{{\varepsilon ^{\alpha }}}{\alpha -1}\left({\sum \limits_{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]
$(ii)$ Let for all $k\le {N_{1}}\hspace{0.2778em}{c_{k}}=0$ and ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\ne 0$. Then
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim \frac{\alpha \varepsilon }{\alpha -1}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]
$(iii)$ Let for all $k\le {N_{1}}\hspace{0.2778em}{c_{k}}=0$ and ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$. Then
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim \frac{\alpha {\varepsilon ^{2}}}{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]

Proof.

Similarly to proof of Theorem 2, we start with several asymptotic relations as $\varepsilon \to 0$. Namely, let ${p_{k}}>0$, $|{c_{k}}|\le 1$. Taking into account the Taylor expansion of ${(1+x)^{\alpha }}$ that has the form

\[ {(1+x)^{\alpha }}=1+\alpha x+o(x),\hspace{0.2778em}x\to 0,\]

we can write:

(8)

\[ \begin{array}{c}\displaystyle \alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}=\alpha {c_{k}}{p_{k}^{\alpha -1}}{(1+{c_{k}}{p_{k}^{-1}}\varepsilon )^{\alpha -1}}\\ {} \displaystyle =\alpha {c_{k}}{p_{k}^{\alpha -1}}(1+(\alpha -1){c_{k}}{p_{k}^{-1}}\varepsilon +o(\varepsilon ))\\ {} \displaystyle =\alpha {c_{k}}{p_{k}^{\alpha -1}}+\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ),\hspace{0.2778em}\varepsilon \to 0.\end{array}\]

As a consequence, we get the following asymptotic relations:

(9)

\[ \alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}=o({\varepsilon ^{\alpha -1}}),\hspace{0.2778em}\varepsilon \to 0,\]

and

\[ \alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}=\alpha {c_{k}}{p_{k}^{\alpha -1}}+o(1),\hspace{0.2778em}\varepsilon \to 0\]

$(i)$ Applying L’Hospital’s rule, we get:

\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{H_{\alpha }}(p)-{H_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{\alpha }}}& =\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & -\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{\alpha {\varepsilon ^{\alpha -1}}}\\ {} & \times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & =\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}+{\varepsilon ^{1-\alpha }}{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}o({\varepsilon ^{\alpha -1}})}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & =\frac{1}{\alpha -1}\left({\sum \limits_{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]

$(ii)$ In this case we can transform the value under a limit as follows:

\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{\varepsilon }\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1)\varepsilon }\left(\log \left({\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)-\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)\right)\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha {c_{k}}{p_{k}^{\alpha -1}}+o(1))}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{\alpha }{\alpha -1}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]

$(iii)$ Finally, in the 3rd case,

\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{2}}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{2}}}\left(\log \left({\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)-\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)\right)\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha {c_{k}}{p_{k}^{\alpha -1}}+\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{\alpha }{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]

Theorem is proved. □

Now we conclude with $\alpha >1$. In this case, five different asymptotics are possible.

Theorem 4.

Let the number N and coefficients ${c_{1}},\dots ,{c_{N}}$ be fixed, and let $\alpha >1$. Then five different situations are possible:

$(i)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\ne 0$. Then for any ${N_{1}}\ge 0$ and $\alpha >1$, we have that
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim \frac{\alpha \varepsilon }{\alpha -1}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]
$(ii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 1$, and there exists $k\le {N_{1}}$ such that ${c_{k}}\ne 0$. Then for $\alpha <2$ it holds that
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim \frac{{\varepsilon ^{\alpha }}}{\alpha -1}\left({\sum \limits_{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]
$(iii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 0$ and for all $k\le {N_{1}}$ we have that ${c_{k}}=0$. Then for $\alpha <2$ it holds that
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim \frac{\alpha {\varepsilon ^{2}}}{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]
$(iv)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha =2$. Then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim {\varepsilon ^{2}}\left({\sum \limits_{k=1}^{N}}{c_{k}^{2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{2}}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]
$(v)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha >2$. Then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that
\[ {\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))\sim \frac{\alpha {\varepsilon ^{2}}}{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}},\hspace{0.2778em}\varepsilon \to 0.\]

Proof.

As in the proof of Theorem 3, we shall use expansions (8) and (9). The main tool will be L’Hospital’s rule.

$(i)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\ne 0$. Then for any ${N_{1}}\ge 0$ and $\alpha >1$, we have the following relations:

\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{\varepsilon }& =\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1)\varepsilon }\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & -\frac{1}{(\alpha -1)\varepsilon }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\\ {} & \times \underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & =\frac{\alpha }{\alpha -1}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]

$(ii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 1$, and there exists $k\le {N_{1}}$ such that ${c_{k}}\ne 0$. Then for $\alpha <2$ we have that

\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{\alpha }}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & \hspace{1em}-\frac{1}{(\alpha -1){\varepsilon ^{\alpha }}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{\alpha {\varepsilon ^{\alpha -1}}}\\ {} & \hspace{1em}\times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}((\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{{\varepsilon ^{\alpha -1}}\left({\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\left({\sum \limits_{k=1}^{{N_{1}}}}{c_{k}^{\alpha }}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]

$(iii)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, ${N_{1}}\ge 0$ and for all $k\le {N_{1}}$ we have that ${c_{k}}=0$. Then for $\alpha <2$ it holds that

\[\begin{aligned}{}& \hspace{1em}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{2}}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{2}}}\left(\log \left({\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)-\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)\right)\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha {c_{k}}{p_{k}^{\alpha -1}}+\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{2\varepsilon \left({\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{2\varepsilon \left({\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{\alpha }{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]

$(iv)$ Obviously, in the case $\alpha =2$ we have the simple value of the entropy:

\[ {\mathcal{H}_{2}}(p)=-\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{2}}\right).\]

Therefore, if ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha =2$, then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that

\[\begin{aligned}{}\underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{2}}(p)-{\mathcal{H}_{2}}(p(\epsilon ))}{{\varepsilon ^{2}}}& =\underset{\varepsilon \to 0}{\lim }\frac{1}{{\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{2}}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{2}}\right)\\ {} & -\frac{1}{{\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{2}}\right)=\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\\ {} & \times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}2{c_{k}^{2}}\varepsilon +{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}2{c_{k}}({p_{k}}+{c_{k}}\varepsilon )}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{2}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{2}}}\\ {} & =\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{c_{k}^{2}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}^{2}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{2}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{2}}}\\ {} & =\left({\sum \limits_{k=1}^{N}}{c_{k}^{2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{2}}\right)^{-1}}.\end{aligned}\]

$(v)$ Let ${\textstyle\sum _{k={N_{1}}+1}^{N}}{c_{k}}{p_{k}^{\alpha -1}}=0$, $\alpha >2$. Then for any ${N_{1}}\ge 0$ and ${c_{k}},\hspace{0.2778em}k\le {N_{1}}$, we have that

\[\begin{aligned}{}& \underset{\varepsilon \to 0}{\lim }\frac{{\mathcal{H}_{\alpha }}(p)-{\mathcal{H}_{\alpha }}(p(\epsilon ))}{{\varepsilon ^{2}}}\\ {} & \hspace{1em}=\underset{\varepsilon \to 0}{\lim }\frac{1}{(\alpha -1){\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\sum \limits_{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)\\ {} & \hspace{1em}-\frac{1}{(\alpha -1){\varepsilon ^{2}}}\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{1}{2\varepsilon }\\ {} & \hspace{1em}\times \frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}\alpha {c_{k}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha -1}}}{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}}\\ {} & \hspace{1em}=\frac{1}{\alpha -1}\underset{\varepsilon \to 0}{\lim }\frac{{\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}\alpha {c_{k}^{\alpha }}{\varepsilon ^{\alpha -1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}(\alpha (\alpha -1){c_{k}^{2}}{p_{k}^{\alpha -2}}\varepsilon +o(\varepsilon ))}{2\varepsilon \left({\textstyle\textstyle\sum _{k=1}^{{N_{1}}}}{({c_{k}}\varepsilon )^{\alpha }}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{({p_{k}}+{c_{k}}\varepsilon )^{\alpha }}\right)}\\ {} & \hspace{1em}=\frac{\alpha }{2}\left({\sum \limits_{k={N_{1}}+1}^{N}}{c_{k}^{2}}{p_{k}^{\alpha -2}}\right){\left({\sum \limits_{k={N_{1}}+1}^{N}}{p_{k}^{\alpha }}\right)^{-1}}.\end{aligned}\]

Theorem is proved. □

3.2 Convergence of the disturbed entropy when the initial distribution is uniform but the number of events increases to ∞

The second problem is to establish conditions of stability of the entropy of uniform distribution when the number of events tends to ∞. Let $N>1$, ${p_{N}}(uni)=(\frac{1}{N},\dots ,\frac{1}{N})$ be a vector of uniform distribution with N possible states, $\varepsilon =\varepsilon (N)\le \frac{1}{N}$, and $\{{c_{kN}};\hspace{0.2778em}N\ge 1,\hspace{0.2778em}1\le k\le N\}$ be a family of fixed numbers (not all zero) such that $|{c_{kN}}|\le 1$ and ${\textstyle\sum _{k=1}^{N}}{c_{kN}}=0$. Note that for any $N\ge 1$ there are strictly positive numbers ${c_{kN}}$ for some k and consider the disturbed distribution vector ${p_{N}^{{^{\prime }}}}=(\frac{1}{N}+{c_{1N}}\varepsilon ,\dots ,\frac{1}{N}+{c_{NN}}\varepsilon )$.

Theorem 5.

Let $\varepsilon (N)=o(\frac{1}{N})$, $N\to \infty $. Then

\[ {\mathcal{H}_{\alpha }}({p_{N}})-{\mathcal{H}_{\alpha }}({p_{N}^{{^{\prime }}}})\to 0,\hspace{0.2778em}N\to \infty .\]

Proof.

We know that $N\varepsilon \to 0$, as $N\to \infty $, and the family of numbers $\{{c_{kn}};\hspace{0.2778em}n\ge 1,\hspace{0.2778em}1\le k\le n\}$ is bounded. Therefore the values

\[ \underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\to 1,\hspace{0.2778em}\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\inf }(1+N{c_{kn}}\varepsilon )\to 1,\hspace{0.2778em}N\to \infty ,\]

as the functions of N, and for every $N\ge 1$, $\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\ge 1$. Recall that the function $x\log x$ is increasing for $x\ge 1$, and $x\log x\le 0$ for $0<x<1$. Moreover, Rényi entropy is maximal on the uniform distribution. As a consequence of all these observations and assumptions we get that

\[\begin{aligned}{}0\le {\mathcal{H}_{1}}({p_{N}})-{\mathcal{H}_{1}}({p_{N}^{{^{\prime }}}})& =\frac{1}{N}{\sum \limits_{k=1}^{N}}(1+N{c_{kN}}\varepsilon )\log (1+N{c_{kN}}\varepsilon )\\ {} & \le \frac{1}{N}{\sum \limits_{k=1}^{N}}\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\log \underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\\ {} & =\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\log \underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\\ {} & \to 0,\hspace{0.2778em}N\to \infty .\end{aligned}\]

Let $\alpha >1$. Then

\[\begin{aligned}{}0\le {\mathcal{H}_{\alpha }}({p_{N}})-{\mathcal{H}_{\alpha }}({p_{N}^{{^{\prime }}}})& =\frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{(1+N{c_{kN}}\varepsilon )^{\alpha }}\right)\\ {} & \le \frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{\left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\right)^{\alpha }}\right)\\ {} & =\frac{\alpha }{\alpha -1}\log \left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\sup }(1+N{c_{kn}}\varepsilon )\right)\to 0,\hspace{0.2778em}N\to \infty .\end{aligned}\]

Similarly, for $0<\alpha <1$ we produce the transformations

\[\begin{aligned}{}0\le {\mathcal{H}_{\alpha }}({p_{N}})-{\mathcal{H}_{\alpha }}({p_{N}^{{^{\prime }}}})& =\frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{(1+N{c_{kN}}\varepsilon )^{\alpha }}\right)\\ {} & \le \frac{1}{\alpha -1}\log \left(\frac{1}{N}{\sum \limits_{k=1}^{N}}{\left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\inf }(1+N{c_{kn}}\varepsilon )\right)^{\alpha }}\right)\\ {} & =\frac{\alpha }{\alpha -1}\log \left(\underset{n\ge 1,\hspace{0.2778em}1\le k\le n}{\inf }(1+N{c_{kn}}\varepsilon )\right)\to 0,\hspace{0.2778em}N\to \infty ,\end{aligned}\]

and the proof follows. □

3.3 Binomial and Poisson distributions

In this section we look at convergence of Rényi entropy of the binomial distribution to Rényi entropy of the Poisson distribution.

Theorem 6.

Let $\lambda >0$ be fixed. For any $\alpha >0$

\[ \underset{n\to \infty }{\lim }{\mathcal{H}_{\alpha }}\left(B\left(n,\frac{\lambda }{n}\right)\right)={\mathcal{H}_{\alpha }}(Poi(\lambda )).\]

Proof.

First, let $\alpha =1$. We will find and regroup entropies of the binomial and Poisson distributions.

\[\begin{aligned}{}{\mathcal{H}_{1}}\left(B\left(n,p\right)\right)& =-{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log \left(\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\right)\\ {} & =-{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log \left(\genfrac{}{}{0pt}{}{n}{k}\right)\\ {} & -n\left(p\log p+(1-p)\log (1-p)\right)\\ {} & =-{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!)+np\log n\\ {} & -np\log np-n\log (1-p)+np\log (1-p)\\ {} & =np\log (1-p)-n\log (1-p)-np\log np+np\log n\\ {} & -{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!).\\ {} {\mathcal{H}_{1}}(Poi(\lambda ))& =-{\sum \limits_{k=0}^{\infty }}{e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\log \left({e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\right)=\lambda -\lambda \log \lambda +{\sum \limits_{k=0}^{\infty }}{e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\log k!\end{aligned}\]

We want to show componentwise convergence of entropies. For that let us take $np=\lambda $ and observe that

\[\begin{array}{l}\displaystyle np\log (1-p)=\lambda \log (1-p)\to \lambda \log 1=0,\hspace{0.2778em}n\to \infty ,\hspace{0.2778em}p\to 0.\\ {} \displaystyle -n\log (1-p)=\log {\left(1-\frac{\lambda }{n}\right)^{-n}}\to \lambda ,\hspace{0.2778em}n\to \infty ,\hspace{0.2778em}p\to 0.\\ {} \displaystyle -np\log np=-\lambda \log \lambda ,\end{array}\]

\[\begin{aligned}{}np\log n& -{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!)\\ {} & ={\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}k\log n\\ {} & -{\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}(\log n!-\log k!-\log (n-k)!)\\ {} & ={\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log {n^{k}}-\log n!+\log k!+\log (n-k)!\right)\\ {} & ={\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right).\end{aligned}\]

It is well known that $\frac{\log (x)}{x}\le 1$, $x>0$. Using this fact, we get the following representation:

\[\begin{aligned}{}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log \frac{{n^{k}}(n-k)!}{n!}& =\frac{n!}{(n-k)!k!}{\left(\frac{\lambda }{n}\right)^{k}}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\log \frac{{n^{k}}(n-k)!}{n!}\\ {} & =\frac{{\lambda ^{k}}}{k!}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\frac{n!}{{n^{k}}(n-k)!}\log \frac{{n^{k}}(n-k)!}{n!}\\ {} & \le \frac{{\lambda ^{k}}}{k!}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\le \frac{{\lambda ^{k}}}{k!}.\end{aligned}\]

For the second part of the sum simply observe that

\[\begin{aligned}{}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\log k!& =\frac{n!}{(n-k)!k!}{\left(\frac{\lambda }{n}\right)^{k}}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\log k!\\ {} & =\frac{{\lambda ^{k}}}{k!}\log k!{\left(1-\frac{\lambda }{n}\right)^{n-k}}\frac{n!}{(n-k)!{n^{k}}}\\ {} & \le \frac{{\lambda ^{k}}}{k!}\log k!\end{aligned}\]

As ${\textstyle\sum _{k=0}^{\infty }}\frac{{\lambda ^{k}}}{k!}\left(1+\log k!\right)<\infty $, by Lebesgue’s dominated convergence theorem:

\[\begin{aligned}{}\underset{n\to \infty }{\lim }& {\sum \limits_{k=0}^{n}}\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right)\\ {} & ={\sum \limits_{k=0}^{\infty }}\underset{n\to \infty }{\lim }\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right)\\ {} & ={\sum \limits_{k=0}^{\infty }}\underset{n\to \infty }{\lim }\frac{{\lambda ^{k}}}{k!}{\left(1-\frac{\lambda }{n}\right)^{n-k}}\frac{n!}{(n-k)!{n^{k}}}\left(\log \frac{{n^{k}}(n-k)!}{n!}+\log k!\right)\\ {} & ={\sum \limits_{k=0}^{\infty }}{e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\log k!\end{aligned}\]

Finally, we get that

\[ \underset{n\to \infty }{\lim }{\mathcal{H}_{1}}\left(B\left(n,\frac{\lambda }{n}\right)\right)={\mathcal{H}_{1}}(Poi(\lambda )).\]

For $\alpha \ne 1$ we have:

\[\begin{array}{l}\displaystyle {\mathcal{H}_{\alpha }}(binomial)=\frac{1}{1-\alpha }\log {\sum \limits_{k=0}^{n}}{\left(\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\right)^{\alpha }},\\ {} \displaystyle {\mathcal{H}_{\alpha }}(poisson)=\frac{1}{1-\alpha }\log {\sum \limits_{k=0}^{+\infty }}{\left({e^{-\lambda }}\frac{{\lambda ^{k}}}{k!}\right)^{\alpha }}.\end{array}\]

Thus, to show that

\[ \underset{n\to \infty }{\lim }{\mathcal{H}_{\alpha }}\left(B\left(n,\frac{\lambda }{n}\right)\right)={\mathcal{H}_{\alpha }}(Poi(\lambda )),\]

it is enough to show the convergence of sums, which follows from Lebesgue’s dominated convergence theorem and

\[ {\left(\left(\genfrac{}{}{0pt}{}{n}{k}\right){p^{k}}{(1-p)^{n-k}}\right)^{\alpha }}\le {\left(\frac{{\lambda ^{k}}}{k!}\right)^{\alpha }},\hspace{0.2778em}{\sum \limits_{k=0}^{+\infty }}{\left(\frac{{\lambda ^{k}}}{k!}\right)^{\alpha }}<+\infty .\]

□

A Appendix

We let $0\log 0=0$ by continuity and prove several auxiliary results. Stating these three lemmas, we assume that ${p_{i}}\ge 0$, $1\le i\le N$ are fixed.

Lemma 6.

${H_{\alpha }}(p)\to H(p)$, $\alpha \to 1$.

Proof.

Using L’Hospital’s rule, we get the following relations:

\[\begin{aligned}{}\underset{\alpha \to 1}{\lim }{H_{\alpha }}(p)& =\underset{\alpha \to 1}{\lim }\frac{1}{1-\alpha }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\underset{\alpha \to 1}{\lim }\frac{1}{-1}\frac{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}\log ({p_{k}})}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}}\\ {} & =-{\sum \limits_{k=1}^{N}}{p_{k}}\log ({p_{k}})=H(p).\end{aligned}\]

□

Let ${\mathcal{H}_{1}}(p):=\mathcal{H}(p)$ (Shannon entropy), and so ${H_{\alpha }}(p)$ is defined for all $\alpha >0$ and is continuous in α.

Lemma 7.

${H_{\alpha }}(p)$ is nonincreasing in $\alpha >0$.

Proof.

Indeed,

\[\begin{aligned}{}\frac{\partial {H_{\alpha }}(p)}{\partial \alpha }& =\frac{1}{{(1-\alpha )^{2}}}\log \left({\sum \limits_{i=1}^{N}}{p_{i}^{\alpha }}\right)+\frac{1}{1-\alpha }\frac{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}\log {p_{k}}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}}\\ {} & =\frac{1}{{(1-\alpha )^{2}}{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}}{\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\left(\log \left({\sum \limits_{i=1}^{N}}{p_{i}^{\alpha }}\right)+\log {p_{k}^{1-\alpha }}\right)\\ {} & =\frac{-1}{{(1-\alpha )^{2}}}{\sum \limits_{k=1}^{N}}\frac{{p_{k}^{\alpha }}}{{\textstyle\textstyle\sum _{i=1}^{N}}{p_{i}^{\alpha }}}\log \left(\frac{{p_{k}^{\alpha }}}{{\textstyle\textstyle\sum _{i=1}^{N}}{p_{i}^{\alpha }}}\frac{1}{{p_{k}}}\right).\end{aligned}\]

Let ${q_{k}}=\frac{{p_{k}^{\alpha }}}{{\textstyle\sum _{i=1}^{N}}{p_{i}^{\alpha }}}$. Then

\[ \frac{\partial {H_{\alpha }}(p)}{\partial \alpha }=\frac{-1}{{(1-\alpha )^{2}}}{\sum \limits_{k=1}^{N}}{q_{k}}\log \frac{{q_{k}}}{{p_{k}}}\le 0.\]

The fact that ${\mathcal{H}_{\alpha }}(p)\le {\mathcal{H}_{1}}(p)\le {\mathcal{H}_{\beta }}(p)$, where $0<\beta <1<\alpha $ follows from Lemma 6. □

Lemma 8.

${\mathcal{H}_{\alpha }}(p)\le \log N$ and it reaches maximum when distribution is uniform.

Proof.

Let $1\le m\le N$ be the number of nonzero probabilities. Then we have

\[ \underset{\alpha \to 0+}{\lim }{\mathcal{H}_{\alpha }}(p)=\underset{\alpha \to 0+}{\lim }\frac{1}{1-\alpha }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\log m\le \log N.\]

So ${\mathcal{H}_{\alpha }}(p)\le \log N$ due to Lemma 7. For the second part, put ${p_{1}}=\cdots ={p_{N}}=\frac{1}{N}$.

\[\begin{array}{l}\displaystyle {\mathcal{H}_{1}}(p)=-{\sum \limits_{k=1}^{N}}\frac{1}{N}\log \left(\frac{1}{N}\right)=\log N.\\ {} \displaystyle {\mathcal{H}_{\alpha }}(p)=\frac{1}{1-\alpha }\log \left({\sum \limits_{k=1}^{N}}\frac{1}{{N^{\alpha }}}\right)=\frac{1}{1-\alpha }\log \left(\frac{N}{{N^{\alpha }}}\right)=\log N.\end{array}\]

□

Remark 3.

Let $1\le m\le N$ be the number of nonzero probabilities and without loss of generality let ${p_{k}}<{p_{1}}=\cdots ={p_{{N_{1}}}}$ for every ${N_{1}}+1\le k\le N$. Then we can also define

\[ {\mathcal{H}_{0}}(p):=\underset{\alpha \to 0+}{\lim }{\mathcal{H}_{\alpha }}(p)=\underset{\alpha \to 0+}{\lim }\frac{1}{1-\alpha }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\log m.\]

\[\begin{aligned}{}{\mathcal{H}_{\infty }}(p):& =\underset{\alpha \to +\infty }{\lim }{\mathcal{H}_{\alpha }}(p)=\underset{\alpha \to +\infty }{\lim }\frac{1}{1-\alpha }\log \left({\sum \limits_{k=1}^{N}}{p_{k}^{\alpha }}\right)=\\ {} & =\underset{\alpha \to +\infty }{\lim }-\frac{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}\log {p_{k}}}{{\textstyle\textstyle\sum _{k=1}^{N}}{p_{k}^{\alpha }}}\\ {} & =\underset{\alpha \to +\infty }{\lim }-\frac{{N_{1}}\log {p_{1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{\left(\frac{{p_{k}}}{{p_{1}}}\right)^{\alpha }}\log {p_{k}}}{{N_{1}}+{\textstyle\textstyle\sum _{k={N_{1}}+1}^{N}}{\left(\frac{{p_{k}}}{{p_{1}}}\right)^{\alpha }}}\\ {} & =-\log {p_{1}}.\end{aligned}\]

References

[1]

Acharya, J., Orlitsky, A., Suresh, A.T., Tyagi, H.: The Complexity of Estimating Rényi Entropy. Proceedings of the 2015 Annual ACM-SIAM Symposium on Discrete Algorithms (2015). Book Code:PRDA15. MR3451148. https://doi.org/10.1137/1.9781611973730.124

[2]

Banna, O., Buryak, F., Yu, M.: Distance from fractional Brownian motion with associated Hurst index $0<H<1/2$ to the subspaces of Gaussian martingales involving power integrands with an arbitrary positive exponent. Mod. Stoch. Theory Appl. 7, pp. 191–202 (2020). MR4120614. https://doi.org/10.15559/20-vmsta156

[3]

Banna, O., Mishura, Yu., Ralchenko, K., Shklyar, S.: Fractional Brownian Motion. Approximations and Projections. Wiley-ISTE, London (2019). 288 p.

[4]

Bégin, L., Germain, P., Laviolette, F., Roy, J.-F.: PAC-Bayesian Bounds based on the Rényi Divergence. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, vol. 51, pp. 435–444. PMLR (2016).

[5]

Bobkov, S.G., Marsiglietti, A., Melbourne, J.: Concentration functions and entropy bounds for discrete log-concave distributions. arXiv:2007.11030v1 [math.PR]

[6]

Erven, T., Harremoës, P.: Rényi Divergence and majorization. In: IEEE Transactions on Information Theory (ISIT), INSPEC 11434178 (2010).

[7]

Erven, T., Harremoës, P.: Rényi Divergence and Kullback-Leibler Divergence. IEEE Trans. Inf. Theory 60, 3797–3820 (2014). MR3225930. https://doi.org/10.1109/TIT.2014.2320500

[8]

Gil, M., Alajaji, F., Linder, T.: Rényi divergence measures for commonly used univariate continuous distributions. Inf. Sci. 249, 124–131 (2013). MR3105467. https://doi.org/10.1016/j.ins.2013.06.018

[9]

Hanel, R., Thurner, S., Tsallis, C.: On the robustness of q-expectation values and Rényi entropy. Europhys. Lett. Assoc. 852 (2009).

[10]

Harremoës, P.: Binomial and Poisson Distributions as Maximum Entropy Distributions. IEEE Trans. Inf. Theory 475, 2039–2041 (2001). MR1842536. https://doi.org/10.1109/18.930936

[11]

Harremoës, P.: Interpretations of Rényi entropies and divergences. Physica A (2006). MR2223335. https://doi.org/10.1016/j.physa.2006.01.012

[12]

Ho, S.-W., Verdú, S.: Convexity/concavity of Rényi entropy and α-mutual information. In: IEEE International Symposium on Information Theory (ISIT), pp. 745–749 (2015). https://doi.org/10.1109/ISIT.2015.7282554

[13]

Lenzi, E.K., Mendes, R.S., Silva, L.R.: Statistical mechanics based on Rényi entropy. Physica A 280, 337–345 (2000). MR1758523. https://doi.org/10.1016/S0378-4371(00)00007-8

[14]

Rényi, A.: On Measures of Entropy and Information. In: Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, vol. 1, pp. 547–561. The Regents of University of the California, Berkeley, CA, USA (1961). MR0132570

[15]

Sason, I.: Tight Bounds on the Rényi Entropy via Majorization with Applications to Guessing and Compression. Entropy 20(12), 896 (2018).

[16]

Yang, Y., Li, J., Yang, Y.: Multiscale multifractal multiproperty analysis of financial time series based on Rényi entropy. Int. J. Mod. Phys. C 28(2) (2017). MR3284253. https://doi.org/10.1016/j.physa.2014.11.009

Reading mode

Table of contents

1 Introduction
2 Convexity of the Rényi entropy
3 Robustness of the Rényi entropy
A Appendix
References

Open access article under the CC BY license.

Keywords

Discrete distribution Rényi entropy convexity

MSC2010

60E05 94A17

Metrics

since March 2018

1786

Article info
views

757

Full article
views

727

PDF
downloads

208

XML
downloads

RSS

Figures
6
Theorems
6

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Graph of ${\mathcal{H}_{\alpha }}(p)$, where ${p_{1}}={p_{2}}=0.05,{p_{3}}$ is changing from 0 to 0.9 and ${p_{4}}=1-{p_{1}}-{p_{2}}-{p_{3}}$

Theorem 1.

Theorem 2.

Theorem 3.

Theorem 4.

Theorem 5.

Theorem 6.

Authors

Abstract

1 Introduction

2 Convexity of the Rényi entropy

2.1 The form and the continuity of the 2nd derivative

(1)

Lemma 1.

(2)

(3)

Proof.

2.2 Behavior of the 2nd derivative at the origin

Lemma 2.

Proof.

(4)

Remark 1.

(5)

Lemma 3.

Proof.

Theorem 1.

Proof.

Lemma 4.

Proof.

2.3 Superposition of entropy that is convex

(6)

Lemma 5.

Proof.

Corollary 1.

Proof.

Remark 2.

2.4 Graphs of ${\mathcal{H}_{\alpha }}(p)$ and its second derivative for several probability distributions

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

3 Robustness of the Rényi entropy

3.1 Rate of convergence of the disturbed entropy when the initial distribution is arbitrary but fixed

Theorem 2.

Proof.

(7)

Theorem 3.

Proof.

(8)

(9)

Theorem 4.

Proof.

3.2 Convergence of the disturbed entropy when the initial distribution is uniform but the number of events increases to ∞

Theorem 5.

Proof.

3.3 Binomial and Poisson distributions

Theorem 6.

Proof.

A Appendix

Lemma 6.

Proof.

Lemma 7.

Proof.

Lemma 8.

Proof.

Remark 3.

References

Export citation

Copy and paste formatted citation

Download citation in file

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Theorem 1.

Theorem 2.

Theorem 3.

Theorem 4.

Theorem 5.

Theorem 6.