Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 4, Issue 3 (2017)
  4. Weighted entropy: basic inequalities

Modern Stochastics: Theory and Applications

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Cited by
  • More
    Article info Full article Cited by

Weighted entropy: basic inequalities
Volume 4, Issue 3 (2017), pp. 233–252
Mark Kelbert   Izabella Stuhl   Yuri Suhov  

Authors

 
Placeholder
https://doi.org/10.15559/17-VMSTA85
Pub. online: 2 October 2017      Type: Research Article      Open accessOpen Access

Received
30 August 2017
Revised
18 September 2017
Accepted
18 September 2017
Published
2 October 2017

Abstract

This paper represents an extended version of an earlier note [10]. The concept of weighted entropy takes into account values of different outcomes, i.e., makes entropy context-dependent, through the weight function. We analyse analogs of the Fisher information inequality and entropy power inequality for the weighted entropy and discuss connections with weighted Lieb’s splitting inequality. The concepts of rates of the weighted entropy and information are also discussed.

1 Introduction

This paper represents an extended version of an earlier note [10].1 We also follow earlier publications discussing related topics: [20, 21, 19, 18]. The Shannon entropy (SE) of a probability distribution p or the Shannon differential entropy (SDE) of a probability density function (PDF) f
(1.1)
\[h(\mathbf{p})=-\sum \limits_{i}p(x_{i})\log p(x_{i}),\hspace{2em}h(f)=-\int f(x)\log f(x)\mathrm{d}x\]
is context-free, i.e., does not depend on the nature of outcomes $x_{i}$ or x, but only upon probabilities $p(x_{i})$ or values $f(x)$. It gives the notion of entropy a great flexibility which explains its successful applications. However, in many situations it seems insufficient, and the context-free property appears as a drawback. Viz., suppose you learn a news about severe weather conditions in an area far away from your place. Such conditions usually do not happen; an event like this has a small probability $p\ll 1$ and conveys a high information $-\log p$. At the same time you hear that a tree near your parking lot in the town has fallen and damaged a number of cars. The probability of this event is also low, so the amount of information is again high. However, the value of this information for you is higher than in the first event. Considerations of this character can motivate a study of weighted information and entropy, making them context-dependent.
Definition 1.1.
Let us define the weighted entropy (WE) as
(1.2)
\[{h_{\varphi }^{\mathrm{w}}}(\mathbf{p})=-\sum \limits_{i}\varphi (x_{i})p(x_{i})\log p(x_{i}).\]
Here a non-negative weight function (WF) $x_{i}\mapsto \varphi (x_{i})$ is introduced, representing a value/utility of an outcomes $x_{i}$. A similar approach can be used for the differential entropy of a probability density function (PDF) f. Define the weighted differential entropy (WDE) as
(1.3)
\[{h_{\varphi }^{\mathrm{w}}}(f)=-\int \varphi (\mathbf{x})f(\mathbf{x})\log f(\mathbf{x})\mathrm{d}\mathbf{x}.\]
An initial example of a WF φ may be $\varphi (\mathbf{x})=\mathbf{1}$ ($\mathbf{x}\in A$) where A is a particular subset of outcomes (an event). A heuristic use of the WE with such a WF was demonstrated in [4, 5]. Another example repeatedly used below is $f(\mathbf{x})={f_{C}^{\mathrm{No}}}(\mathbf{x})$, a d-dimensional Gaussian PDF with mean $\mathbf{0}$ and covariance matrix C. Here
(1.4)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {h_{\varphi }^{\mathrm{w}}}\big({f_{C}^{\mathrm{No}}}\big)& \displaystyle =\frac{\alpha _{\varphi }(C)}{2}\log \big[{(2\pi )}^{d}\mathrm{det}(C)\big]+\frac{\log e}{2}\mathrm{tr}\big[{C}^{-1}\varPhi _{C,\varphi }\big]\hspace{1em}\text{where}\\{} \displaystyle \alpha _{\varphi }(C)& \displaystyle =\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x}){f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x},\hspace{1em}\varPhi _{C,\varphi }=\int _{{\mathbb{R}}^{d}}\mathbf{x}{\mathbf{x}}^{\mathrm{T}}\varphi (\mathbf{x}){f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x}.\end{array}\]
For $\varphi (\mathbf{x})=1$ we get the normal SDE $h({f_{C}^{\mathrm{No}}})=\frac{1}{2}\log [{(2\pi \mathrm{e})}^{d}\mathrm{det}\hspace{0.1667em}C]$.
In this note we give a brief introduction into the concept of the weighted entropy. We do not always give proofs, referring the reader to the quoted original papers. Some basic properties of WE and WDE have been presented in [20]; see also references therein to early works on the subject. Applications of the WE and WDE to the security quantification of information systems are discussed in [15]. Other domains range from the stock market to the image processing, see, e.g., [6, 9, 12, 14, 23, 26].
Throughout this note we assume that the series and integrals in (1.2)–(1.3) and the subsequent equations converge absolutely, without stressing it every time again. To unify the presentation, we will often use integrals $\int _{\mathcal{X}}\mathrm{d}\mu $ relative to a reference σ-finite measure μ on a Polish space $\mathcal{X}$ with a Borel σ-algebra $\mathfrak{X}$. In this regard, the acronym PM/DF (probability mass/density function) will be employed. Usual measurability assumptions will also be in place for the rest of the presentation. We also assume that the WF $\varphi >0$ on an open set in $\mathcal{X}$.
In some parts of the presentation, the sums and integrals comprising a PM/DF will be written as expectations: this will make it easier to explain/use assumptions and properties involved. Viz., Eqns (1.2)–(1.3) can be given as ${h_{\varphi }^{\mathrm{w}}}(\mathbf{p})=-\mathbb{E}\varphi (\mathbf{X})\log \mathbf{p}(\mathbf{X})$ and ${h_{\varphi }^{\mathrm{w}}}(f)=-\mathbb{E}\varphi (\mathbf{X})\log f(\mathbf{X})$ where X is a random variable (RV) with the PM/DF p or f. Similarly, in (1.4), $\alpha _{\varphi }(C)=\mathbb{E}\varphi (\mathbf{X})$ and $\boldsymbol{\varPhi }_{C,\varphi }=\mathbb{E}\varphi (\mathbf{X})\mathbf{X}{\mathbf{X}}^{\mathrm{T}}$ where $\mathbf{X}\sim \mathrm{N}(\mathbf{0},C)$.

2 The weighted Gibbs inequality

Given two non-negative functions f, g (typically, PM/DFs), define the weighted Kullback-Leibler divergence (or the relative WE, briefly RWE) as
(2.1)
\[{D_{\varphi }^{\mathrm{w}}}(f\| g)=\int _{\mathcal{X}}\varphi (\mathbf{x})f(\mathbf{x})\log \frac{f(\mathbf{x})}{g(\mathbf{x})}\mathrm{d}\mu (\mathbf{x}).\]
Theorem 1.3 from [20] states:
Theorem 2.1.
Suppose that
(2.2)
\[\int _{\mathcal{X}}\varphi (\mathbf{x})\big[f(\mathbf{x})-g(\mathbf{x})\big]\mathrm{d}\mu (\mathbf{x})\ge 0.\]
Then ${D_{\varphi }^{\mathrm{w}}}(f\| g)\ge 0$. Moreover, ${D_{\varphi }^{\mathrm{w}}}(f\| g)=0$ iff $\varphi (\mathbf{x})[\frac{g(\mathbf{x})}{f(\mathbf{x})}-1]=0$ f-a.e.
Example 2.2.
For an exponential family in the canonical form
(2.3)
\[f_{\underline{\theta }}(\mathbf{x})=h(\mathbf{x})\exp \big(\big\langle \underline{\theta },T(\mathbf{x})\big\rangle -A(\underline{\theta })\big),\hspace{1em}\mathbf{x}\in {\mathbb{R}}^{d},\hspace{2.5pt}\underline{\theta }\in {\mathbb{R}}^{m},\]
with the sufficient statistics $T(\mathbf{x})$ we have
(2.4)
\[{D_{\varphi }^{\mathrm{w}}}(f_{\underline{\theta }_{1}}\| f_{\underline{\theta }_{2}})={e}^{A_{\varphi }(\underline{\theta }_{1})-A(\underline{\theta }_{1})}\big(A(\underline{\theta }_{2})-A(\underline{\theta }_{1})-\big\langle \nabla A_{\varphi }(\underline{\theta }_{1}),\underline{\theta }_{2}-\underline{\theta }_{1}\big\rangle \big),\]
where ∇ stands for the gradient w.r.t. to the parameter vector $\underline{\theta }$, and
(2.5)
\[A_{\varphi }(\underline{\theta })=\log \int \varphi (\mathbf{x})h(\mathbf{x})\exp \big(\big\langle \underline{\theta },T(\mathbf{x})\big\rangle \big)\mathrm{d}\mathbf{x}.\]

3 Concavity/convexity of the weighted entropy

Theorems 2.1 and 2.2 from [20] offer the following assertion:
Theorem 3.1.
(a) The WE/WDE functional $f\mapsto {h_{\varphi }^{\mathrm{w}}}(f)$ is concave in argument f. Namely, for any PM/DFs $f_{1}(x)$, $f_{2}(x)$ and $\lambda _{1},\lambda _{2}\in [0,1]$ such that $\lambda _{1}+\lambda _{2}=1$,
(3.1)
\[{h_{\varphi }^{\mathrm{w}}}(\lambda _{1}f_{1}+\lambda _{2}f_{2})\ge \lambda _{1}{h_{\varphi }^{\mathrm{w}}}(f_{1})+\lambda _{2}{h_{\varphi }^{\mathrm{w}}}(f_{2}).\]
The equality iff $\varphi (x)[f_{1}(x)-f_{2}(x)]=0$ holds for $(\lambda _{1}f_{1}+\lambda _{2}f_{2})$-a.a. x.
(b) However, the RWE functional $(f,g)\mapsto {D_{\varphi }^{\mathrm{w}}}(f\| g)$ is convex: given two pairs of PDFs $(f_{1},f_{2})$ and $(g_{1},g_{2})$,
(3.2)
\[\lambda _{1}{D_{\varphi }^{\mathrm{w}}}(f_{1}\| g_{1})+\lambda _{2}{D_{\varphi }^{\mathrm{w}}}(f_{2}\| g_{2})\ge {D_{\varphi }^{\mathrm{w}}}(\lambda _{1}f_{1}+\lambda _{2}f_{2}\| \lambda _{1}g_{1}+\lambda _{2}g_{2}),\]
with equality iff $\lambda _{1}\lambda _{2}=0$ or $\varphi (x)[f_{1}(x)-f_{2}(x)]=\varphi (x)[g_{1}(x)-g_{2}(x)]=0$ μ-a.e.

4 Weighted Ky-Fan and Hadamard inequalities

The map $C\mapsto \delta (C):=\log \mathrm{det}(C)$ gives a concave function of a (strictly) positive-definite $(d\times d)$ matrix C: $\delta (C)-\lambda _{1}\delta (C_{1})-\lambda _{2}\delta (C_{2})\ge 0$, where $C=\lambda _{1}C_{1}+\lambda _{2}C_{2}$, $\lambda _{1}+\lambda _{2}=1$ and $\lambda _{1,2}\ge 0$. This is the well-known Ky-Fan inequality. It terms of differential entropies it is equivalent to the bound
(4.1)
\[h\big({f_{C}^{\mathrm{No}}}\big)-\lambda _{1}h\big({f_{C_{1}}^{\mathrm{No}}}\big)-\lambda _{2}h\big({f_{C_{2}}^{\mathrm{No}}}\big)\ge 0\]
and is closely related to a maximising property of the Gaussian differential entropy $h({f_{C}^{\mathrm{No}}})$.
Theorem 4.1 below presents one of new bounds of Ky-Fan type, in its most explicit form, for the WF $\varphi (\mathbf{x})=\exp ({\mathbf{x}}^{T}\mathbf{t}),\mathbf{t}\in {\mathbb{R}}^{d}$. Cf. Theorem 3.5 from [20]. In this case the identity ${h_{\varphi }^{\mathrm{w}}}({f}^{\mathrm{No}})=\exp (\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t})h({f}^{\mathrm{No}})$ holds true. Introduce a set
(4.2)
\[\mathcal{S}=\big\{\mathbf{t}\in {\mathbb{R}}^{d}:{F}^{(1)}(\mathbf{t})\ge 0,\hspace{2.5pt}{F}^{(2)}(\mathbf{t})\le 0\big\}.\]
Here functions ${F}^{(1)}$ and ${F}^{(2)}$ incorporate parameters $C_{i}$ and $\lambda _{i}$:
(4.3)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {F}^{(1)}(\mathbf{t})& \displaystyle =\sum \limits_{i=1}^{2}\lambda _{i}\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{i}\mathbf{t}\bigg)-\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg),\hspace{1em}\mathbf{t}\in {\mathbb{R}}^{d},\\{} \displaystyle {F}^{(2)}(\mathbf{t})& \displaystyle =\Bigg[\sum \limits_{i=1}^{2}\lambda _{i}\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{i}\mathbf{t}\bigg)-\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg)\Bigg]\log \big[{(2\pi )}^{d}\mathrm{det}(C)\big]\\{} & \displaystyle \hspace{1em}+\sum \limits_{i=1}^{2}\lambda _{i}\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{i}\mathbf{t}\bigg)\mathrm{tr}\big[{C}^{-1}C_{i}\big]-d\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg),\hspace{1em}\mathbf{t}\in {\mathbb{R}}^{d}.\end{array}\]
Theorem 4.1.
Given positive-definite matrices $C_{1},C_{2}$ and $\lambda _{1},\lambda _{2}\in [0,1]$ with $\lambda _{1}+\lambda _{2}=1$, set $C=\lambda _{1}C_{1}+\lambda _{2}C_{2}$. Assume $\mathbf{t}\in \mathcal{S}$. Then
(4.4)
\[h\big({f_{C}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg)-h\big({f_{C_{1}}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{1}\mathbf{t}\bigg)-h\big({f_{C_{2}}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{2}\mathbf{t}\bigg)\ge 0,\]
with equality iff $\lambda _{1}\lambda _{2}=0$ or $C_{1}=C_{2}$.
For $\mathbf{t}=\mathbf{0}$ we obtain $\varphi \equiv 1$, and (4.4) coincides with (4.1). Theorem 4.1 is related to the maximisation property of the weighted Gaussian entropy which takes the form of Theorem 4.2. Cf. Example 3.2 in [20].
Theorem 4.2.
Let $f(\mathbf{x})$ be a PDF on ${\mathbb{R}}^{d}$ with mean $\mathbf{0}$ and $(d\times d)$ covariance matrix C. Let ${f}^{\mathrm{No}}(\mathbf{x})$ stand for the Gaussian PDF, again with the mean $\mathbf{0}$ and covariance matrix C. Define $(d\times d)$ matrices
(4.5)
\[\boldsymbol{\varPhi }=\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\mathbf{x}{\mathbf{x}}^{\mathrm{T}}f(\mathbf{x})\mathrm{d}\mathbf{x},\hspace{2em}{\boldsymbol{\varPhi }_{C}^{\mathrm{No}}}=\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\mathbf{x}{\mathbf{x}}^{\mathrm{T}}{f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x}.\]
Cf. (1.4). Assume that
\[\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\big[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})\big]\mathrm{d}\mathbf{x}\ge 0\]
and
\[\log \big[{(2\pi )}^{d}(\mathrm{det}\hspace{0.1667em}C)\big]\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\big[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})\big]\mathrm{d}\mathbf{x}+\mathrm{tr}\big[{C}^{-1}\big({\boldsymbol{\varPhi }_{C}^{\mathrm{No}}}-\boldsymbol{\varPhi }\big)\big]\le 0.\]
Then ${h_{\varphi }^{\mathrm{w}}}(f)\le {h_{\varphi }^{\mathrm{w}}}({f_{C}^{\mathrm{No}}})$, with equality iff $\varphi (\mathbf{x})[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})]=0$ a.e.
Theorems 4.1 and 4.2 are a part of a series of the so-called weighted determinantal inequalities. See [20, 22]. Here we will focus on a weighted version of Hadamard inequality asserting that for a $(d\times d)$ positive-definite matrix $C=(C_{\mathit{ij}})$, $\mathrm{det}\hspace{0.2778em}C\le {\prod _{j=1}^{d}}C_{\mathit{jj}}$ or $\delta (C)\le {\sum _{j=1}^{d}}\log C_{\mathit{jj}}$. Cf. [20], Theorem 3.7. Let ${f_{C_{\mathit{jj}}}^{\mathrm{No}}}$ stand for the Gaussian PDF on $\mathbb{R}$ with the zero mean and the variance $C_{\mathit{jj}}$. Set:
\[\alpha =\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x}){f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x}\hspace{2.5pt}\text{(cf. (1.4)).}\]
Theorem 4.3.
Assume that
\[\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\Bigg[{f_{C}^{\mathrm{No}}}(\mathbf{x})-\prod \limits_{j=1}^{d}{f_{C_{\mathit{jj}}}^{\mathrm{No}}}(x_{j})\Bigg]\mathrm{d}\mathbf{x}\ge 0.\]
Then, with the matrix $\boldsymbol{\varPhi }=(\varPhi _{\mathit{ij}})$ as in (4.5),
\[\alpha \log \prod \limits_{j=1}^{d}(2\pi C_{\mathit{jj}})\hspace{0.1667em}+\hspace{0.1667em}(\log \mathrm{e})\sum \limits_{j=1}^{d}{C_{\mathit{jj}}^{-1}}\varPhi _{\mathit{jj}}\hspace{0.1667em}-\hspace{0.1667em}\alpha \log \big[{(2\pi )}^{d}(\mathrm{det}\hspace{0.1667em}C)\big]\hspace{0.1667em}-\hspace{0.1667em}(\log \mathrm{e})\mathrm{tr}\hspace{0.1667em}{C}^{-1}\boldsymbol{\varPhi }\hspace{0.1667em}\ge \hspace{0.1667em}0.\]

5 A weighted Fisher information matrix

Let $\mathbf{X}=(X_{1},\dots ,X_{d})$ be a random $(1\times d)$ vector with PDF $f_{\underline{\theta }}(\mathbf{x})=f_{\mathbf{X}}(\mathbf{x},\underline{\theta })$ where $\underline{\theta }=(\theta _{1},\dots ,\theta _{m})\in {\mathbb{R}}^{m}$. Suppose that $\underline{\theta }\to f_{\underline{\theta }}$ is ${C}^{1}$. Define a score vector $S(\mathbf{X},\underline{\theta })=\mathbf{1}(f_{\underline{\theta }}(\mathbf{x})>0)(\frac{\partial }{\partial \theta _{i}}\log f_{\underline{\theta }}(\mathbf{x}),i=1,\dots ,m)$. The $m\times m$ weighted Fisher information matrix (WFIM) is defined as
(5.1)
\[{J_{\varphi }^{\mathrm{w}}}(f_{\underline{\theta }})={J_{\varphi }^{\mathrm{w}}}(\mathbf{X},\underline{\theta })=\mathbb{E}\big[\varphi (\mathbf{X})S(\mathbf{X};\underline{\theta }){S}^{T}(\mathbf{X};\underline{\theta })\big].\]
Theorem 5.1 (Connection between WFIM and weighted KL-divergence measures).
For smooth families $\{f_{\theta },\theta \in \varTheta \in {\mathbb{R}}^{1}\}$ and a given WF φ, we get
(5.2)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {D_{\varphi }^{\mathrm{w}}}(f_{\theta _{1}}\| f_{\theta _{2}})& \displaystyle =\frac{1}{2}{J_{\varphi }^{\mathrm{w}}}(X,\theta _{1}){(\theta _{2}-\theta _{1})}^{2}+\mathbb{E}_{\theta _{1}}\big[\varphi (X)D_{\theta }\log f_{\theta _{1}}(X)\big](\theta _{1}-\theta _{2})\\{} & \displaystyle \hspace{1em}-\frac{1}{2}\mathbb{E}_{\theta _{1}}\bigg[\varphi (X)\frac{{D_{\theta }^{2}}f_{\theta _{1}}(X)}{f_{\theta _{1}}(X)}\bigg]{(\theta _{2}-\theta _{1})}^{2}+o\big(|\theta _{1}-\theta _{2}{|}^{2}\big)\mathbb{E}_{\theta _{1}}\big[\varphi (X)\big]\end{array}\]
where $D_{\theta }$ stands for $\frac{\partial }{\partial \theta }$.
Proof.
By virtue of a Taylor expansion of $\log f_{\theta _{2}}$ around $\theta _{1}$, we obtain
(5.3)
\[\log f_{\theta _{2}}=\log f_{\theta _{1}}+D_{\theta }\log f_{\theta _{1}}(\theta _{2}-\theta _{1})+\frac{1}{2}{D_{\theta }^{2}}\log f_{\theta _{1}}{(\theta _{2}-\theta _{1})}^{2}+O_{x}\big(|\underline{\theta }_{2}-\underline{\theta }_{1}{|}^{3}\big).\]
Here $O_{x}(|\theta _{2}-\theta _{1}{|}^{3})$ denotes the reminder term which has a hidden dependence on x. Multiply both sides of (5.3) by φ and take expectations assuming that we can interchange differentiation and expectation appropriately. Next, observe that
(5.4)
\[{D_{\theta }^{2}}\log f_{\theta _{1}}=\frac{{D_{\theta }^{2}}f_{\theta _{1}}}{f_{\theta _{1}}}-\frac{{(D_{\theta }f_{\theta _{1}})}^{2}}{{f_{\theta _{1}}^{2}}}.\]
Hence
(5.5)
\[\mathbb{E}_{\theta _{1}}\big[\varphi {D_{\theta }^{2}}\log f_{\theta _{1}}\big]=\mathbb{E}_{\theta _{1}}\bigg[\varphi \frac{{D_{\theta }^{2}}f_{\theta _{1}}}{f_{\theta _{1}}}\bigg]-{J_{\varphi }^{\mathrm{w}}}(f_{\theta _{1}}).\]
Therefore the claimed result, i.e., (5.2), is achieved.  □

6 Weighted entropy power inequality

Let $\mathbf{X}_{1},\mathbf{X}_{2}$ be independent RVs with PDFs $f_{1},f_{2}$ and $\mathbf{X}=\mathbf{X}_{1}+\mathbf{X}_{2}$. The famous Shannon entropy power inequality (EPI) states that
(6.1)
\[h(\mathbf{X}_{1}+\mathbf{X}_{2})\ge h(\mathbf{N}_{1}+\mathbf{N}_{2}),\]
where $\mathbf{N}_{1}$, $\mathbf{N}_{2}$ are Gaussian $\mathrm{N}(\mathbf{0},{\sigma }^{2}\mathbf{I}_{d})$ RVs such that $h(\mathbf{X}_{i})=h(\mathbf{N}_{i})$, $i=1,2$. Equivalently,
(6.2)
\[{e}^{\frac{2}{d}h(\mathbf{X}_{1}+\mathbf{X}_{2})}\ge {e}^{\frac{2}{d}h(\mathbf{X}_{1})}+{e}^{\frac{2}{d}h(\mathbf{X}_{2})},\]
see, e.g., [1, 7]. The EPI is widely used in electronics, i.e., consider a RV Y which satisfies
(6.3)
\[\mathbf{Y}_{n}=\sum \limits_{i=0}^{\infty }a_{i}\mathbf{X}_{n-i},n\in {\mathbf{Z}}^{1},\hspace{1em}\sum \limits_{i=0}^{\infty }|a_{i}{|}^{2}<\infty ,\]
where $a_{i}\in {\mathbf{R}}^{1}$, $\{\mathbf{X}_{i}\}$ are IID RVs. Then the EPI means
(6.4)
\[h(\mathbf{Y})\ge h(\mathbf{X})+\frac{1}{2}\log \Bigg(\sum \limits_{i=0}^{\infty }|a_{i}{|}^{2}\Bigg),\]
with equality if and only if either X is Gaussian or if $\mathbf{Y}_{n}=\mathbf{X}_{n-k}$, for some k, that is, the filtering operation is a pure delay. Clearly, a possible extension of the EPI gives more flexibility in signal processing. We are interested in the weighted entropy power inequality (WEPI)
(6.5)
\[\kappa :=\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(\mathbf{X}_{1})}{d\mathbb{E}\varphi (\mathbf{X}_{1})}\bigg)+\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(\mathbf{X}_{2})}{d\mathbb{E}\varphi (\mathbf{X}_{2})}\bigg)\le \exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(\mathbf{X})}{d\mathbb{E}\varphi (\mathbf{X})}\bigg).\]
Note that (6.5) coincides with (6.2) when $\varphi \equiv 1$. Let $d=1$, we set
(6.6)
\[\alpha ={\mathrm{tan}}^{-1}\bigg[\exp \bigg(\frac{{h_{\varphi }^{\mathrm{w}}}(X_{2})}{\mathbb{E}\varphi (X_{2})}-\frac{{h_{\varphi }^{\mathrm{w}}}(X_{1})}{\mathbb{E}\varphi (X_{1})}\bigg)\bigg],\hspace{1em}Y_{1}=\frac{X_{1}}{\cos \alpha },Y_{2}=\frac{X_{2}}{\sin \alpha }.\]
Theorem 6.1.
Given independent RVs $X_{1},X_{2}\in {\mathbb{R}}^{1}$ with PDFs $f_{1},f_{2}$, and the weight function φ, set $X=X_{1}+X_{2}$. Assume the following conditions:
(i)
(6.7)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbb{E}\varphi (X_{i})& \displaystyle \ge \mathbb{E}\varphi (X)\hspace{1em}\textit{if}\hspace{2.5pt}\kappa \ge 1,\hspace{2.5pt}i=1,2,\\{} \displaystyle \mathbb{E}\varphi (X_{i})& \displaystyle \le \mathbb{E}\varphi (X)\hspace{1em}\textit{if}\hspace{2.5pt}\kappa \le 1,\hspace{2.5pt}i=1,2.\end{array}\]
(ii) With $Y_{1},Y_{2}$ and α as defined in (6.6),
(6.8)
\[{(\cos \alpha )}^{2}{h_{\varphi _{c}}^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\varphi _{s}}^{\mathrm{w}}}(Y_{2})\le {h_{\varphi }^{\mathrm{w}}}(X),\]
where $\varphi _{c}(x)=\varphi (x\cos \alpha )$, $\varphi _{s}(x)=\varphi (x\sin \alpha )$ and
(6.9)
\[{h_{\varphi _{c}}^{\mathrm{w}}}(Y_{1})=-\mathbb{E}\big[\varphi _{c}(Y_{1})\log \big(f_{Y_{1}}(Y_{1})\big)\big],\hspace{2em}{h_{\varphi _{s}}^{\mathrm{w}}}(Y_{2})=-\mathbb{E}\big[\varphi _{s}(Y_{2})\log \big(f_{Y_{2}}(Y_{2})\big)\big].\]
Then the WEPI holds.
Paying homage to [13] we call (6.8) weighted Lieb’s splitting inequality (WLSI). In some cases the WLSI may be effectively checked.
Proof.
Note that
(6.10)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {h_{\varphi }^{\mathrm{w}}}(X_{1})& \displaystyle ={h_{\varphi _{c}}^{\mathrm{w}}}(Y_{1})+\mathbb{E}\varphi (X_{1})\log \cos \alpha ,\\{} \displaystyle {h_{\varphi }^{\mathrm{w}}}(X_{2})& \displaystyle ={h_{\varphi _{s}}^{\mathrm{w}}}(Y_{2})+\mathbb{E}\varphi (X_{2})\log \sin \alpha .\end{array}\]
Using (6.8), we have the following inequality
(6.11)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {h_{\varphi }^{\mathrm{w}}}(X)& \displaystyle \ge {(\cos \alpha )}^{2}\big[{h_{\varphi }^{\mathrm{w}}}(X_{1})-\mathbb{E}\varphi (X_{1})\log \cos \alpha \big]\\{} & \displaystyle \hspace{1em}+{(\sin \alpha )}^{2}\big[{h_{\varphi }^{\mathrm{w}}}(X_{2})-\mathbb{E}\varphi (X_{2})\log \sin \alpha \big].\end{array}\]
Furthermore, recalling the definition of κ in (6.5) we obtain
(6.12)
\[{h_{\varphi }^{\mathrm{w}}}(X)\ge \frac{1}{2\kappa }\big[\mathbb{E}\varphi (X_{1})\log \kappa \big]\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(X_{1})}{\mathbb{E}\varphi (X_{1})}\bigg)+\frac{1}{2\kappa }\big[\mathbb{E}\varphi (X_{1})\log \kappa \big]\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(X_{2})}{\mathbb{E}\varphi (X_{2})}\bigg).\]
By virtue of assumption (6.7), we derive
(6.13)
\[{h_{\varphi }^{\mathrm{w}}}(X)\ge \frac{1}{2}\mathbb{E}\varphi (X)\log \kappa .\]
The definition of κ in (6.5) leads directly to the result.  □
Example 6.2.
Let $d=1$ and $X_{1}\sim \mathrm{N}(0,{\sigma _{1}^{2}})$, $X_{2}\sim \mathrm{N}(0,{\sigma _{2}^{2}})$. Then the WLSI (6.8) takes the following form
(6.14)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]\mathbb{E}\varphi (X)+\frac{\log e}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\mathbb{E}\big[{X}^{2}\varphi (X)\big]\\{} & \displaystyle \hspace{1em}\ge {(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\mathbb{E}\varphi (X_{1})+\frac{{(\cos \alpha )}^{2}\log e}{{\sigma _{1}^{2}}}\mathbb{E}\big[{X_{1}^{2}}\varphi (X_{1})\big]\\{} & \displaystyle \hspace{2em}+{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg]\mathbb{E}\varphi (X_{2})+\frac{{(\sin \alpha )}^{2}\log e}{{\sigma _{2}^{2}}}\mathbb{E}\big[{X_{2}^{2}}\varphi (X_{2})\big].\end{array}\]
Example 6.3.
Let $d=1$, $X=X_{1}+X_{2}$, $X_{1}\sim \mathrm{U}[a_{1},b_{1}]$ and $X_{2}\sim \mathrm{U}[a_{2},b_{2}]$ be independent. Denote by $\varPhi (x)={\int _{0}^{x}}\varphi (u)\mathrm{d}u$ and $L_{i}=b_{i}-a_{i}$, $i=1,2$. The WDE ${h_{\varphi }^{\mathrm{w}}}(X_{i})=\frac{\varPhi (b_{i})-\varPhi (a_{i})}{L_{i}}\log L_{i}$. Then the inequality $\kappa \ge (\le )1$ takes the form ${L_{1}^{2}}+{L_{2}^{2}}\ge (\le )1$. Suppose for definiteness that $L_{2}\ge L_{1}$ or, equivalently, $C_{1}:=a_{2}+b_{1}\le a_{1}+b_{2}=:C_{2}$. Inequalities (6.7) take the form
(6.15)
\[L_{2}\big[\varPhi (b_{1})-\varPhi (a_{1})\big],\hspace{2em}L_{1}\big[\varPhi (b_{2})-\varPhi (a_{2})\big]\ge (\le )\mathbb{E}\varphi (X).\]
The WLSI takes the form
(6.16)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle -\varLambda +\log (L_{1}L_{2})\mathbb{E}\varphi (X)& \displaystyle \ge {(\cos \alpha )}^{2}\frac{\varPhi (b_{1})-\varPhi (a_{1})}{L_{1}}\log \bigg(\frac{L_{1}}{\cos \alpha }\bigg)\\{} & \displaystyle \hspace{1em}+{(\sin \alpha )}^{2}\frac{\varPhi (b_{2})-\varPhi (a_{2})}{L_{2}}\log \bigg(\frac{L_{2}}{\sin \alpha }\bigg),\end{array}\]
where
(6.17)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varLambda & \displaystyle =\frac{\log L_{1}}{L_{2}}\big[\varPhi (C_{1})-\varPhi (C_{2})\big]+\frac{1}{L_{1}L_{2}}\Bigg[{\int _{A}^{C_{1}}}\varphi (x)(x-A)\log (x-A)\mathrm{d}x\\{} & \displaystyle \hspace{1em}+{\int _{C_{2}}^{B}}\varphi (x)(B-x)\log (B-x)\mathrm{d}x\Bigg],\\{} \displaystyle A& \displaystyle =a_{1}+a_{2},\hspace{2.5pt}B=b_{1}+b_{2}.\end{array}\]
Finally, define ${\varPhi }^{\ast }(x)={\int _{0}^{x}}u\varphi (u)\mathrm{d}u$ and note that
(6.18)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbb{E}\varphi (X)& \displaystyle =\frac{1}{L_{1}L_{2}}\big[{\varPhi }^{\ast }(C_{1})-{\varPhi }^{\ast }(A)-{\varPhi }^{\ast }(B)+{\varPhi }^{\ast }(C_{2})\big]\\{} & \displaystyle \hspace{1em}-A\big[\varPhi (C_{1})-\varPhi (A)\big]+L_{1}\big[\varPhi (C_{2})-\varPhi (C_{1})\big]+B\big[\varPhi (B)-\varPhi (C_{2})\big].\end{array}\]

7 The WLSI for the WF close to a constant

Proposition 7.1.
Let $d=1$, $X_{i}\sim \mathrm{N}(\mu _{i},{\sigma _{i}^{2}}),i=1,2$ be independent and $X=X_{1}+X_{2}\sim \mathrm{N}(\mu _{1}+\mu _{2},{\sigma _{1}^{2}}+{\sigma _{2}^{2}})$. Suppose that WF $x\to \varphi (x)$ is twice continuously differentiable and
(7.1)
\[\big|{\varphi ^{\prime\prime }}(x)\big|\le \epsilon \varphi (x),\hspace{2em}\big|\varphi (x)-\bar{\varphi }\big|\le \epsilon ,\]
where $\epsilon >0$ and $\bar{\varphi }>0$ are constants. Then there exists $\epsilon _{0}>0$ such that for any WF φ satisfying (7.1) with $0<\epsilon <\epsilon _{0}$ WLSI holds true. Hence, checking of the WEPI is reduced to condition (6.7).
For a RV Z, $\gamma >0$ and independent Gaussian RV $\mathbf{N}\sim \mathrm{N}(\mathbf{0},\mathbf{I}_{d})$ define
(7.2)
\[M(\mathbf{Z};\gamma )=\mathbb{E}\big[{\big\| \mathbf{Z}-\mathbb{E}[\mathbf{Z}|\mathbf{Z}\sqrt{\gamma }+\mathbf{N}]\big\| }^{2}\big],\]
where $\| .\| $ stands for the Euclidean norm. According to [24, 8] the differential entropy
(7.3)
\[h(\mathbf{Z})=h(\mathbf{N})+\frac{1}{2}{\int _{0}^{\infty }}\big[M(\mathbf{Z};\gamma )-\mathbf{1}_{\{\gamma <1\}}\big]\mathrm{d}\gamma .\]
For $\mathbf{Z}=\mathbf{Y}_{1},\mathbf{Y}_{2},\mathbf{X}_{1}+\mathbf{X}_{2}$ assume the following conditions
(7.4)
\[\mathbb{E}\big[\big|\log f_{\mathbf{Z}}(\mathbf{Z})\big|\big]<\infty ,\mathbb{E}\big[\| \mathbf{Z}{\| }^{2}\big]<\infty \]
and the uniform integrability: for independent $\mathbf{N},{\mathbf{N}^{\prime }}\sim \mathrm{N}(\mathbf{0},\mathbf{I})$ and any $\gamma >0$ there exist an integrable function $\xi (\mathbf{Z},\mathbf{N})$ such that
(7.5)
\[\bigg|\log \mathbb{E}\bigg[f_{\mathbf{Z}}\bigg(\mathbf{Z}+\frac{\mathbf{N}-{\mathbf{N}^{\prime }}}{\sqrt{\gamma }}|\mathbf{Z},\mathbf{N}\bigg)\bigg]\bigg|\le \xi (\mathbf{Z},\mathbf{N}).\]
Theorem 7.2.
Let $d=1$ and assume conditions (7.4), (7.5). Let $\gamma _{0}$ be a point of continuity of $M(Z;\gamma ),Z=Y_{1},Y_{2},X_{1}+X_{2}$. Suppose that there exists $\delta >0$ such that
(7.6)
\[M(X_{1}+X_{2};\gamma _{0})\ge M(Y_{1},\gamma _{0}){(\cos \alpha )}^{2}+M(Y_{2};\gamma _{0}){(\sin \alpha )}^{2}+\delta .\]
Suppose that for some $\bar{\varphi }>0$ the WF satisfies
(7.7)
\[\big|\varphi (x)-\bar{\varphi }\big|<\epsilon .\]
Then there exists $\epsilon _{0}=\epsilon _{0}(\gamma _{0},\delta ,f_{1},f_{2})$ such that for any WF satisfying (7.7) with $\epsilon <\epsilon _{0}$ the WLSI holds true.
Proof.
For a constant WF $\bar{\varphi }$, the following inequality is valid (see [8], Lemma 4.2 or [24], Eqns (9) and (10))
(7.8)
\[{(\cos \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{2})\le {h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1}\cos \alpha +Y_{2}\sin \alpha ).\]
However, in view of Theorem 4.1 from [8], the representation (7.3) and inequality (7.6) imply under conditions (7.4) and (7.5) a stronger inequality
(7.9)
\[{(\cos \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{2})+c_{0}\delta \le {h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1}\cos \alpha +Y_{2}\sin \alpha ).\]
Here $c_{0}>0$ and the term of order δ appears from integration in (7.3) in a neighbourhood of the continuity point $\gamma _{0}$. Define ${\varphi }^{\ast }(x)=|\varphi (x)-\bar{\varphi }|$. It is easy to check that
(7.10)
\[{h_{{\varphi }^{\ast }}^{\mathrm{w}}}(Z)<c_{1}\epsilon ,\hspace{1em}Z=X_{1},X_{2},X_{1}+X_{2}.\]
From (7.9) and (7.10) we obtain that for ϵ small enough
(7.11)
\[{(\cos \alpha )}^{2}{h_{\varphi }^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\varphi }^{\mathrm{w}}}(Y_{2})\le {h_{\varphi }^{\mathrm{w}}}(Y_{1}\cos \alpha +Y_{2}\sin \alpha ),\]
i.e., the WLSI holds true.  □
As an example, consider the case where RVs $X_{1},X_{2}$ are normal and WF $\varphi \in {C}^{2}$.
Proposition 7.3.
Let RVs $X_{i}\sim \mathrm{N}(\mu _{i},{\sigma _{i}^{2}}),i=1,2$ be independent, and $X=X_{1}+X_{2}\sim \mathrm{N}(\mu _{1}+\mu _{2},{\sigma _{1}^{2}}+{\sigma _{2}^{2}})$. Suppose that WF $x\in \mathbb{R}\to \varphi (x)\ge 0$ is twice contiuously differentiable and slowly varying in the sense that $\forall x$,
(7.12)
\[\big|{\varphi ^{\prime\prime }}(x)\big|\le \epsilon \varphi (x),\hspace{2em}\big|\varphi (x)-\bar{\varphi }\big|<\epsilon ,\]
where $\epsilon >0$ and $\bar{\varphi }>0$ are constants. Then there exists $\epsilon _{0}=\epsilon _{0}(\mu _{0},\mu _{1},{\sigma _{0}^{2}},{\sigma _{2}^{2}})>0$ such that for any $0<\epsilon \le \epsilon _{0}$, the WLSI (6.8) with the WF φ holds true.
Proof.
Let α be as in (6.6); to check (6.8), we use Stein’s formula: for $Z\sim \mathrm{N}(0,{\sigma }^{2})$
(7.13)
\[\mathbb{E}\big[{Z}^{2}\varphi (Z)\big]={\sigma }^{2}\mathbb{E}\big[\varphi (Z)\big]+{\sigma }^{4}\mathbb{E}\big[{\varphi ^{\prime\prime }}(Z)\big].\]
Owing the inequality $|\varphi (x)-\bar{\varphi }|<\epsilon $ we have
(7.14)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \alpha <\alpha _{0}& \displaystyle ={\tan }^{-1}\big(\exp \big({(\bar{\varphi }+\epsilon )}^{2}\big[h_{+}(X_{2})-h_{-}(X_{1})\big]\\{} & \displaystyle \hspace{1em}-{(\bar{\varphi }-\epsilon )}^{2}\big[h_{+}(X_{1})-h_{-}(X_{2})\big]\big)\big).\end{array}\]
Here
(7.15)
\[h_{\pm }(X_{i})=-\mathbb{E}\big[\mathbf{1}\big(X_{i}\in {A_{\pm }^{i}}\big)\log {f_{X_{i}}^{No}}(X_{i})\big],\hspace{1em}i=1,2.\]
and
(7.16)
\[{A_{+}^{i}}=\big\{x\in \mathbf{R}:{f_{i}^{No}}(x)<1\big\},\hspace{2em}{A_{-}^{i}}=\big\{x\in \mathbf{R}:{f_{i}^{No}}(x)>1\big\},\hspace{1em}i=1,2.\]
Evidently, under conditions $|{\varphi ^{\prime }}(x)|,|{\varphi ^{\prime\prime }}(x)|<\epsilon \varphi (x)$ we have that $\alpha _{0}<\frac{\pi }{2}-\epsilon $ and $0<\epsilon <{(\sin \alpha )}^{2}$, ${(\cos \alpha )}^{2}<1-\epsilon <1$. We claim that inequality (6.14) is satisfied with φ replaced by $\bar{\varphi }$ and added $\delta >0$:
(7.17)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]\bar{\varphi }+\frac{\log e}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\mathbb{E}\big[{X}^{2}\big]\bar{\varphi }\\{} & \displaystyle \hspace{1em}\ge {(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\bar{\varphi }+\frac{{(\cos \alpha )}^{2}\log e}{{\sigma _{1}^{2}}}\mathbb{E}\big[{X_{1}^{2}}\big]\bar{\varphi }\\{} & \displaystyle \hspace{2em}+{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg]\bar{\varphi }+\frac{{(\sin \alpha )}^{2}\log e}{{\sigma _{2}^{2}}}\mathbb{E}\big[{X_{2}^{2}}\big]\bar{\varphi }+\delta .\end{array}\]
Here $\delta >0$ is calculated through ϵ and increases to a limit $\delta _{0}>0$ as $\epsilon \downarrow 0$. Indeed, strict concavity of $\log y$ for $y\in [0,\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\vee \frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}]$ implies that
(7.18)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]& \displaystyle \ge {(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\\{} & \displaystyle \hspace{1em}+{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg]+\delta .\end{array}\]
On the other hand,
(7.19)
\[\frac{1}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\bar{\varphi }\mathbb{E}\big[{X}^{2}\big]=\frac{{(\cos \alpha )}^{2}}{{\sigma _{1}^{2}}}\bar{\varphi }\mathbb{E}\big[{X_{1}^{2}}\big]+\frac{{(\sin \alpha )}^{2}}{{\sigma _{2}^{2}}}\bar{\varphi }\mathbb{E}\big[{X_{2}^{2}}\big].\]
Combining (7.18) and (7.19) one gets (7.17). Now, to check (6.8) with WF φ, in view of (7.17) it suffices to verify
(7.20)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]\big[\mathbb{E}\varphi (X)-\bar{\varphi }\big]+\frac{\log e}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\mathbb{E}\big[{X}^{2}\big(\varphi (X)-\bar{\varphi }\big)\big]\\{} & \displaystyle \hspace{1em}-{(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\big[\mathbb{E}\varphi (X_{1})-\bar{\varphi }\big]+\frac{{(\cos \alpha )}^{2}\log e}{{\sigma _{1}^{2}}}\mathbb{E}\big[{X_{1}^{2}}\big(\varphi (X_{1})-\bar{\varphi }\big)\big]\\{} & \displaystyle \hspace{1em}-{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg][\mathbb{E}\varphi (X_{2})-\bar{\varphi })\hspace{0.1667em}+\frac{{(\sin \alpha )}^{2}\log e}{{\sigma _{2}^{2}}}\mathbb{E}\big[{X_{2}^{2}}\big(\varphi (X_{2})\hspace{0.1667em}-\hspace{0.1667em}\bar{\varphi }\big)\big]<\hspace{0.1667em}\delta .\end{array}\]
We check (7.20) by a brute force, claiming that each term in (7.20) has the absolute value $<\delta /6$ when ϵ is small enough. For the terms containing $\mathbb{E}[\varphi (Z)-\bar{\varphi }]$, $Z=X,X_{1}X_{2}$, this follows since $|\varphi -\bar{\varphi }|<\epsilon $. For the terms containing factor $\mathbb{E}[{Z}^{2}(\varphi (Z)-\bar{\varphi })]$, we use Stein’s formula (7.13) and the condition that $|{\varphi ^{\prime\prime }}(x)|\le \epsilon \varphi (x)$.  □
Similar assertions can be established for other examples of PDFs $f_{1}(x)$ and $f_{2}(x)$, i.e. uniform, exponential, Gamma, Cauchy, etc.

8 A weighted Fisher information inequality

Let $\mathbf{Z}=(\mathbf{X},\mathbf{Y})$ be a pair of independent RVs X and $\mathbf{Y}\in {\mathbb{R}}^{d}$, with sample values $\mathbf{z}=(\mathbf{x},\mathbf{y})\in {\mathbb{R}}^{d}\times {\mathbb{R}}^{d}$ and marginal PDFs $f_{1}(\mathbf{x},\underline{\theta }),f_{2}(\mathbf{y},\underline{\theta })$, respectively. Let $f_{\mathbf{Z}|\mathbf{X}+\mathbf{Y}}(\mathbf{x},\mathbf{y}|\mathbf{u})$ stand for the conditional PDF as
(8.1)
\[f_{\mathbf{Z}|\mathbf{X}+\mathbf{Y}}(\mathbf{x},\mathbf{y}|\mathbf{u})=\frac{f_{1}(\mathbf{x})f_{2}(\mathbf{y})\mathbf{1}(\mathbf{x}+\mathbf{y}=\mathbf{u})}{\int _{{\mathbb{R}}^{n}}f_{1}(\mathbf{v})f_{2}(\mathbf{u}-\mathbf{v})\mathrm{d}\mathbf{v}}.\]
Given a WF $\mathbf{z}=(\mathbf{x},\mathbf{y})\in {\mathbb{R}}^{d}\times {\mathbb{R}}^{d}\mapsto \varphi (\mathbf{z})\ge 0$, we employ the following reduced WFs:
(8.2)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varphi (\mathbf{u})& \displaystyle =\int \varphi (\mathbf{v},\mathbf{u}-\mathbf{v})f_{\mathbf{Z}|\mathbf{X}+\mathbf{Y}}(\mathbf{v},\mathbf{u}-\mathbf{v})\mathrm{d}\mathbf{v},\\{} \displaystyle \varphi _{1}(\mathbf{x})& \displaystyle =\int \varphi (\mathbf{x}+\mathbf{y},\mathbf{y})f_{2}(\mathbf{y})\mathrm{d}\mathbf{y},\varphi _{2}(\mathbf{y})=\int \varphi (\mathbf{x},\mathbf{x}+\mathbf{y})f_{1}(\mathbf{x})\mathrm{d}\mathbf{x}.\end{array}\]
Next, let us introduce the matrices $M_{\varphi }$ and $G_{\varphi }$:
(8.3)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle M_{\varphi }& \displaystyle =\int \varphi (\mathbf{x},\mathbf{y})f_{1}(\mathbf{x})f_{2}(\mathbf{y}){\bigg(\frac{\partial \log f_{1}(\mathbf{x})}{\partial \underline{\theta }}\bigg)}^{\mathrm{T}}\bigg(\frac{\partial \log f_{2}(\mathbf{x})}{\partial \underline{\theta }}\bigg)\mathbf{1}\big(f_{1}(\mathbf{x})f_{2}(\mathbf{y})>0\big)\mathrm{d}\mathbf{x}\mathrm{d}\mathbf{y},\\{} \displaystyle G_{\varphi }& \displaystyle ={\big({J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }{\big({J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}.\end{array}\]
Note that for $\varphi \equiv 1$ we have $M_{\varphi }=G_{\varphi }=0$ and the classical Fisher information inequality emerges (cf. [27]). Finally, we define
(8.4)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varXi & \displaystyle :=\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})=M_{\varphi }{J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})G_{\varphi }{(\mathbf{I}-M_{\varphi }G_{\varphi })}^{-1}M_{\varphi }\big[G_{\varphi }{J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})G_{\varphi }-{J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big]\\{} & \displaystyle \hspace{1em}+G_{\varphi }{(\mathbf{I}-M_{\varphi }G_{\varphi })}^{-1}M_{\varphi }G_{\varphi }{J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big[{M_{\varphi }^{-1}}-G_{\varphi }\big]-G_{\varphi }{J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})G_{\varphi }-G_{\varphi }.\end{array}\]
Theorem 8.1 (A weighted Fisher information inequality (WFII)).
Let X and Y be independent RVs. Assume that ${f_{\mathbf{X}}^{(1)}}=\frac{\partial }{\partial \underline{\theta }}f_{1}$ is not a multiple of ${f_{\mathbf{Y}}^{(1)}}=\frac{\partial }{\partial \underline{\theta }}f_{2}$. Then
(8.5)
\[{J_{\varphi }^{\mathrm{w}}}(\mathbf{X}+\mathbf{Y})\le (\mathbf{I}-M_{\varphi }G_{\varphi }){\big[{\big({J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}-\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})\big]}^{-1}.\]
Proof.
We use the same methodology as in Theorem 1 from [27]. Recalling Corollary 4.8, (iii) in [20] substitute $\mathtt{P}:=[1,1]$. Therefore for $\mathbf{Z}=(\mathbf{X},\mathbf{Y})$, ${J}^{\mathrm{w}}(\mathbf{Z})$ is an $m\times m$ matrix
(8.6)
\[{\big({\mathtt{J}_{\theta }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}={\left(\begin{array}{c@{\hskip10.0pt}c}{\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})& M_{\varphi }\\{} M_{\varphi }& {\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\end{array}\right)}^{-1}.\]
Next, we need the following well-known expression for the inverse of a block matrix
(8.7)
\[{\left(\begin{array}{c@{\hskip10.0pt}c}\mathbf{C}_{11}& \mathbf{C}_{21}\\{} \mathbf{C}_{12}& \mathbf{C}_{22}\end{array}\right)}^{-1}=\left(\begin{array}{c@{\hskip10.0pt}c}{\mathbf{C}_{11}^{-1}}+\mathbf{D}_{12}{\mathbf{D}_{22}^{-1}}\mathbf{D}_{21}& -\mathbf{D}_{12}{\mathbf{D}_{22}^{-1}}\\{} -{\mathbf{D}_{22}^{-1}}\mathbf{D}_{21}& {\mathbf{D}_{22}^{-1}}\end{array}\right),\]
where
\[\begin{array}{r@{\hskip10.0pt}c}& \displaystyle \mathbf{D}_{22}=\mathbf{C}_{22}-\mathbf{C}_{21}{\mathbf{C}_{11}^{-1}}\mathbf{C}_{12},\hspace{2em}\mathbf{D}_{12}={\mathbf{C}_{11}^{-1}}\mathbf{C}_{12},\hspace{2em}\mathbf{D}_{21}=\mathbf{C}_{21}{\mathbf{C}_{11}^{-1}},\\{} & \displaystyle \mathbf{C}_{11}={\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X}),\hspace{2em}\mathbf{C}_{22}={\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y}),\hspace{1em}\text{and}\hspace{1em}\mathbf{C}_{12}=\mathbf{C}_{21}=M_{\varphi }.\end{array}\]
By using the Schwarz inequality, we derive
(8.8)
\[{M_{\varphi }^{2}}\le {\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\hspace{0.2778em}{\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y}),\hspace{1em}\text{or }M_{\varphi }\hspace{0.2778em}G_{\varphi }\le \mathbf{I},\]
with equality iff ${f_{\mathbf{X}}^{(1)}}(\mathbf{x})=\frac{\partial \log f_{1}(\mathbf{x})}{\partial \underline{\theta }}\propto \frac{\partial \log f_{2}(\mathbf{y})}{\partial \underline{\theta }}={f_{\mathbf{Y}}^{(1)}}(\mathbf{y})$.
Define
(8.9)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \delta & \displaystyle :={\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})-\tilde{M}_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }=(\mathtt{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi }){\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\\{} & \displaystyle \Rightarrow {\delta }^{-1}={\big({\mathtt{J}_{\theta _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathtt{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}.\end{array}\]
Thus, owing to the (8.7), particularly for $\mathtt{P}=[1,1]$, we can write
(8.10)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathtt{P}{\big({\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}{\mathtt{P}}^{T}& \displaystyle ={\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }\hspace{0.2778em}{\delta }^{-1}\hspace{0.2778em}M_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}\\{} & \displaystyle \hspace{1em}-{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }\hspace{0.2778em}{\delta }^{-1}-{\delta }^{-1}\hspace{0.2778em}M_{\varphi }\hspace{0.2778em}{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\delta }^{-1}.\end{array}\]
Substituting (8.9), in above expression, we have
(8.11)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathtt{P}{\big({\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}{\mathtt{P}}^{T}\\{} & \displaystyle \hspace{1em}={\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+G_{\varphi }{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}M_{\varphi }\hspace{0.2778em}{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}\\{} & \displaystyle \hspace{2em}-G_{\varphi }{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}-{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}M_{\varphi }\hspace{0.2778em}{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}\\{} & \displaystyle \hspace{2em}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}\\{} & \displaystyle \hspace{1em}=\big\{{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })\\{} & \displaystyle \hspace{2em}+G_{\varphi }{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}M_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })-G_{\varphi }\\{} & \displaystyle \hspace{2em}-{\big({\mathtt{J}_{\theta _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}\tilde{M}_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}(\mathtt{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })\\{} & \displaystyle \hspace{2em}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}\big\}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}.\end{array}\]
Consequently by simplifying (8.11), one yields
(8.12)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathtt{P}{\big({\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}{\mathtt{P}}^{T}\\{} & \displaystyle \hspace{1em}=\big\{{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}+\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})\big\}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}.\end{array}\]
By using Corollary 3.4, (iii) from [20], we obtain the property claimed in (8.5):
\[{\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{X}+\mathbf{Y})\le {\big\{\big[{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}+\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})\big]{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}\big\}}^{-1}.\]
This concludes the proof.  □
Proposition 8.2.
Consider additive RV $\mathbf{Z}=\mathbf{X}+\mathbf{N}_{\varSigma }$, such that $\mathbf{N}_{\varSigma }\sim \mathrm{N}(\mathbf{0},\varSigma )$ and $\mathbf{N}_{\varSigma }$ is independent of X. Introduce matrices
(8.13)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle V_{\varphi }(\mathbf{X}|\mathbf{Z})& \displaystyle =\mathbb{E}\big[\varphi \hspace{0.2778em}{\big(\mathbf{X}-\mathbb{E}[\mathbf{X}|\mathbf{Z}]\big)}^{\mathrm{T}}\big(\mathbf{X}-\mathbb{E}(\mathbf{X}|\mathbf{Z})\big)\big],\\{} \displaystyle E_{\varphi }& \displaystyle =\mathbb{E}\big[\varphi \hspace{0.2778em}{\big(\mathbf{Z}-\mathbb{E}[\mathbf{X}|\mathbf{Z}]\big)}^{\mathrm{T}}\big(\mathbf{X}-\mathbb{E}[\mathbf{X}|\mathbf{Z}]\big)\big],\hspace{1em}\overline{E}_{\varphi }=E_{\varphi }+{E_{\varphi }^{\mathrm{T}}}.\end{array}\]
The WFIM of RV Z can be written as
(8.14)
\[{\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})={\big({\varSigma }^{-1}\big)}^{\mathrm{T}}\big\{\mathbb{E}\big[\varphi \hspace{0.2778em}{\mathbf{N}_{\varSigma }^{\mathrm{T}}}\mathbf{N}_{\varSigma }\big]+\overline{E}_{\varphi }-V_{\varphi }(\mathbf{X}|\mathbf{Z})\big\}{\varSigma }^{-1}.\]

9 The weighted entropy power is a concave function

Let $\mathbf{Z}=\mathbf{X}+\mathbf{Y}$ and $\mathbf{Y}\sim \mathrm{N}(\mathbf{0},\sqrt{\gamma }\mathbf{I}_{d})$. In the literature, several elegant proofs, employing the Fisher information inequality or basic properties of mutual information, have been proposed in order to prove that the entropy power (EP) is a concave function of γ [2, 25]. We are interested in the weighted entropy power (WEP) defined as follows:
(9.1)
\[{\mathrm{N}_{\varphi }^{\mathrm{w}}}(\mathbf{Z}):={\mathrm{N}_{\varphi }^{\mathrm{w}}}(f_{\mathbf{Z}})=\exp \bigg\{\frac{2\hspace{0.2778em}{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{d\hspace{0.2778em}\mathbb{E}[\varphi (\mathbf{Z})]}\bigg\}.\]
Compute the second derivative of the WEP
(9.2)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{{\mathrm{d}}^{2}}{\mathrm{d}{\gamma }^{2}}\exp \bigg\{\frac{2}{d}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg\}& \displaystyle =\exp \bigg\{\frac{2}{d}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg\}\\{} & \displaystyle \hspace{1em}\times \bigg[{\bigg(\frac{2}{d}\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg)}^{2}+\bigg(\frac{2}{d}\frac{{\mathrm{d}}^{2}}{\mathrm{d}{\gamma }^{2}}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg)\bigg]\\{} & \displaystyle =\exp \bigg\{\frac{2}{d}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg\}\bigg[{\big(\varLambda (\gamma )\big)}^{2}+\frac{\mathrm{d}}{\mathrm{d}\gamma }\varLambda (\gamma )\bigg],\end{array}\]
where
(9.3)
\[\varLambda (\gamma )=\frac{2}{d}\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}.\]
In view of (9.2) the concavity of the WEP is equivalent to the inequality
(9.4)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }{\big(\varLambda (\gamma )\big)}^{-1}\ge 1.\]
In the spirit of the WEP, we shall present a new proof of concavity of EP. Regarding this, let us apply the WFII (8.5) to $\varphi \equiv 1$. Then a straightforward computation gives
(9.5)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{d}{\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})}\ge 1.\]
Theorem 9.1 (A weighted De Bruijn’s identity).
Let $\mathbf{X}\sim f_{\mathbf{X}}$ be a RV in ${\mathbb{R}}^{n}$, with a PDF $f_{\mathbf{X}}\in {C}^{2}$. For a standard Gaussian RV $\mathbf{N}\sim \mathrm{N}(\mathbf{0},\mathbf{I}_{d})$ independent of X, and given $\gamma >0$, define the RV $\mathbf{Z}=\mathbf{X}+\sqrt{\gamma }\mathbf{N}$ with PDF $f_{\mathbf{Z}}$. Let $\mathbf{V}_{r}$ be the d-sphere of radius r centered at the origin and having surface denoted by $\mathbf{S}_{r}$. Assume that for given WF φ and $\forall \gamma \in (0,1)$ the relations
(9.6)
\[\int f_{\mathbf{Z}}(\mathbf{x})\big|\ln f_{\mathbf{Z}}(\mathbf{x})\big|\mathrm{d}\mathbf{x}<\infty ,\hspace{2em}\int \big|\nabla \hspace{0.2778em}f_{\mathbf{Z}}(\mathbf{y})\ln f_{\mathbf{Z}}(\mathbf{y})\big|\mathrm{d}\mathbf{y}<\infty \]
and
(9.7)
\[\underset{r\to \infty }{\lim }\int _{\mathbb{S}_{r}}\varphi (\mathbf{y})\log f_{Z}(\mathbf{y})\big(\nabla f_{Z}(\mathbf{y})\big)\mathrm{d}S_{r}=0\]
are fulfilled. Then
(9.8)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})=\frac{1}{2}\hspace{0.2778em}\mathrm{tr}\hspace{0.2778em}{\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})-\frac{1}{2}\mathbb{E}\bigg[\varphi \hspace{0.2778em}\frac{\Delta f_{Z}(\mathbf{Z})}{f_{Z}(\mathbf{Z})}\bigg]+\frac{\mathcal{R}(\gamma )}{2}.\]
Here
(9.9)
\[\mathcal{R}(\gamma )=\mathbb{E}\big[\nabla \varphi \hspace{0.2778em}\log f_{\mathbf{Z}}(\mathbf{Z}){\big(\nabla \log f_{\mathbf{Z}}(\mathbf{Z})\big)}^{\mathrm{T}}\big].\]
If we assume that $\varphi \equiv 1$, then the equality (9.8) directly implies (9.4). Hence, the standard entropy power is a concave function of γ.
Next, we establish the concavity of the WEP when the WF is close to a constant.
Theorem 9.2.
Assume conditions (9.6) and (9.7) and suppose that $\forall \gamma \in (0,1)$
(9.10)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{d}{\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})}\ge 1+\epsilon .\]
Then $\exists \delta =\delta (\epsilon )$ such that any WF φ for which $\exists \bar{\varphi }>0$: $|\varphi -\bar{\varphi }|<\delta $, $|\nabla \hspace{0.2778em}\varphi |<\delta $ the WEP (9.1) is a concave function of γ. Under the milder assumption
(9.11)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{d}{\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})}\bigg|_{\gamma =0}\ge 1+\epsilon ,\]
the WEP is a concave function of γ in a small neighbourhood of $\gamma =0$.
Proof.
It is sufficient to check that
(9.12)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\psi (\gamma )\ge 1\hspace{1em}\mathrm{where}\hspace{2.5pt}\psi (\gamma )={\bigg(\frac{2}{d}\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})}]\bigg)}^{-1}=\varLambda {(\gamma )}^{-1}.\]
By a straightforward calculation
(9.13)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \psi (\mathbf{Z})& \displaystyle =d{\big(\mathbb{E}\big[\varphi (\mathbf{Z})\big]\big)}^{2}{\bigg[\frac{\mathrm{d}}{\mathrm{d}\gamma }{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\mathbb{E}\big[\varphi (\mathbf{Z})\big]-{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\frac{\mathrm{d}}{\mathrm{d}\gamma }\mathbb{E}\varphi (\mathbf{Z})\bigg]}^{-1},\\{} \displaystyle \frac{\mathrm{d}}{\mathrm{d}\gamma }{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})& \displaystyle =\frac{1}{2}\mathrm{tr}{J_{\varphi }^{w}}(\mathbf{Y})-\frac{1}{2}\frac{\mathrm{d}}{\mathrm{d}\gamma }\mathbb{E}\big[\varphi (\mathbf{Y})\big]+\frac{1}{2}\mathcal{R}(\gamma ).\end{array}\]
These formulas imply
(9.14)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \psi (\gamma )=\frac{d}{\mathrm{tr}\hspace{0.2778em}J_{\varphi }(\mathbf{Z})}+o(\delta ).\\{} & \displaystyle \mathrm{as}\hspace{0.2778em}\hspace{0.2778em}1-\delta <\mathbb{E}\big[\varphi (\mathbf{Z})\big]<1+\delta ,\hspace{1em}\big|\mathrm{tr}\hspace{0.2778em}{J_{\varphi }^{\mathrm{w}}}(\mathbf{Z})-\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})\big|<\delta \hspace{0.2778em}\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z}).\end{array}\]
Next,
(9.15)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\mathbb{E}\big[\varphi (\mathbf{Z})\big]=\frac{1}{2}\int \varphi (y)\Delta f_{\mathbf{Z}}(\mathbf{y})\mathrm{d}\mathbf{y}\]
and using the Stokes formula one can bound this term by δ. Finally, $|\mathcal{R}(\gamma )|\le \delta $ in view of (9.7), which leads to the claimed result.  □

10 Rates of weighted entropy and information

This section follows [18]. The concept of a rate of the WE or WDE emerges when we work with outcomes in a context of a discrete-time random process (RP):
(10.1)
\[{h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=-\mathbb{E}\varphi _{n}\big({\mathbf{X}_{0}^{n-1}}\big)\log \mathbf{p}_{n}\big({\mathbf{X}_{0}^{n-1}}\big):=\mathbb{E}{I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{X}_{0}^{n-1}}\big).\]
Here the WF $\varphi _{n}$ is made dependent on n: two immediate cases are where (a) $\varphi _{n}({\mathbf{x}_{1}^{n}})={\sum _{j=0}^{n}}\psi (x_{j})$ and (b) $\varphi _{n}({\mathbf{x}_{1}^{n}})={\prod _{j=0}^{n}}\psi (x_{j})$ (an additive and multiplicative WF, respectively). Next, ${\mathbf{X}_{0}^{n-1}}=(X_{0},\dots ,X_{n-1})$ is a random string generated by an RP. For simplicity, let us focus on RPs taking values in a finite set $\mathcal{X}$. Symbol $\mathbb{P}$ stands for the probability measure of X, and $\mathbb{E}$ denotes the expectation under $\mathbb{P}$. For an RP with IID values, the joint probability of a sample ${\mathbf{x}_{0}^{n-1}}=(x_{0},\dots ,x_{n-1})$ is $\mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})={\prod _{j=0}^{n-1}}p(x_{j})$, $p(x)=\mathbb{P}(X_{j}=x)$ being the probability of an individual outcome $x\in \mathcal{X}$. In the case of a Markov chain, $\mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})=\lambda (x_{0}){\prod _{j=1}^{n}}p(x_{j-1},x_{j})$. Here $\lambda (x)$ gives an initial distribution and $p(x,y)$ is the transition probability on $\mathcal{X}$; to reflect this fact, we will sometimes use the notation ${h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n},\lambda )$. The quantity
\[{I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{x}_{0}^{n-1}}\big):=-\varphi _{n}\big({\mathbf{x}_{0}^{n-1}}\big)\log \mathbf{p}_{n}\big({\mathbf{x}_{0}^{n-1}}\big)\]
is interpreted as a weighted information (WI) contained in/conveyed by outcome ${\mathbf{x}_{0}^{n-1}}$.
In the IID case, the WI and WE admit the following representations. Define $S(p)=-\mathbb{E}[\log p(X)]$ and ${H_{\psi }^{\mathrm{w}}}=-\mathbb{E}[\psi (X)\log p(X)]$ to be the SE and the WE, of the one-digit distribution (the capital letter is used to make it distinct from ${h_{\varphi _{n}}^{\mathrm{w}}}$, the multi-time WE).
(A) For an additive WF:
(10.2)
\[{I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{x}_{0}^{n-1}}\big)=-\sum \limits_{j=0}^{n-1}\psi (x_{j})\sum \limits_{l=0}^{n-1}\log p(x_{l})\]
and
(10.3)
\[{h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=n(n-1)S(p)\mathbb{E}\big[\psi (X)\big]+n{H_{\psi }^{\mathrm{w}}}(p):=n(n-1)\mathrm{A}_{0}+n\mathrm{A}_{1}.\]
(B) For a multiplicative WF:
(10.4)
\[{I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{x}_{0}^{n-1}}\big)=-\prod \limits_{j=0}^{n-1}\psi (x_{j})\sum \limits_{l=0}^{n-1}\log p(x_{l})\]
and
(10.5)
\[{h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=n{H_{\psi }^{\mathrm{w}}}(p){\big[\mathbb{E}\varphi (X)\big]}^{n-1}:={\mathrm{B}_{0}^{n-1}}\times n\mathrm{B}_{1}.\]
The values $\mathrm{A}_{0}$, $\mathrm{B}_{0}$ and their analogs in a general situation are referred to as primary rates, and $\mathrm{A}_{1}$, $\mathrm{B}_{1}$ as secondary rates.

10.A WI and WE rates for asymptotically additive WFs

Here we will deal with a stationary RP $\mathbf{X}=(X_{j},j\in \mathbb{Z})$ and use the above notation $\mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})=\mathbb{P}({\mathbf{X}_{0}^{n-1}}={\mathbf{x}_{0}^{n-1}})$ for the joint probability. We will refer to the limit present in the Shannon–McMillan–Breiman (SMB) theorem (see, e.g., [1, 7]) taking place for an ergodic RP:
(10.6)
\[\underset{n\to \infty }{\lim }\bigg[-\frac{1}{n}\log \mathbf{p}_{n}\big({\mathbf{X}_{0}^{n-1}}\big)\bigg]=-\mathbb{E}\log \mathbb{P}\big(X_{0}|{\mathbf{X}_{-\infty }^{-1}}\big):=S,\hspace{1em}\mathbb{P}\text{-a.s.}\]
Here $\mathbb{P}(y|{\mathbf{x}_{-\infty }^{-1}})$ is the conditional PM/DF for $X_{0}=y$ given ${\mathbf{x}_{-\infty }^{-1}}$, an infinite past realization of X. An assumption upon WFs $\varphi _{n}$ called asymptotic additivity (AA) is that
(10.7)
\[\underset{n\to \infty }{\lim }\frac{1}{n}\varphi _{n}\big({\mathbf{X}_{0}^{n-1}}\big)=\alpha ,\hspace{1em}\mathbb{P}\text{-a.s.}\hspace{0.2778em}\text{and/or in}\hspace{2.5pt}\mathrm{L}_{2}(\mathbb{P}).\]
Eqns (10.6), (10.7) lead to the identification of the primary rate: $\mathrm{A}_{0}=\alpha S$.
Theorem 10.1.
Given an ergodic RP X, consider the WI ${I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{X}_{0}^{n-1}})$ and the WE ${H_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})$ as defined in (10.2), (10.3). Suppose that convergence in (10.7) holds $\mathbb{P}$-a.s. Then:
(I) We have that
(10.8)
\[\underset{n\to \infty }{\lim }\frac{{I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{X}_{0}^{n-1}})}{{n}^{2}}=\alpha S,\hspace{1em}\mathbb{P}\textit{-a.s.}\]
(II) Furthermore,
  • (a) suppose that the WFs $\varphi _{n}$ exhibit convergence (10.7), $\mathbb{P}$-a.s., with a finite α, and $|\varphi _{n}({\mathbf{X}_{0}^{n-1}})/n|\le c$ where c is a constant independent of n. Suppose also that convergence in Eqn (10.6) holds true. Then we have that
    (10.9)
    \[\underset{n\to \infty }{\lim }\frac{{h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})}{{n}^{2}}=\alpha S.\]
  • (b) Likewise, convergence in Eqn (10.9) holds true whenever convergences (10.7) and (10.6) hold $\mathbb{P}$-a.s. and $|\log \mathbf{p}_{n}({\mathbf{X}_{0}^{n-1}})/n|\le c$ where c is a constant.
  • (c) Finally, suppose that convergence in (10.6) and (10.7) holds in $\mathrm{L}_{2}(\mathbb{P})$, with finite α and S. Then again, convergence in (10.9) holds true.
Example 10.2.
Clearly, the condition of stationarity cannot be dropped. Indeed, let $\varphi _{n}({x_{0}^{n-1}})=\alpha n$ be an additive WF and X be a (non-stationary) Gaussian process with covariances $C=\{C_{\mathit{ij}},i,j\in {\mathbf{Z}_{+}^{1}}\}$. Let $f_{n}$ be a n-dimensional PDF of the vector $(X_{1},\dots ,X_{n})$. Then
(10.10)
\[{h_{\varphi _{n}}^{w}}(f_{n})=\frac{\alpha n}{2}\big[n\log (2\pi e)+\log \big(\mathrm{det}(C_{n})\big)\big].\]
Suppose that the eigenvalues $\lambda _{1}\le \cdots \le \lambda _{j}\le \cdots \le \lambda _{n}$ of $C_{n}$ have the order $\lambda _{j}\approx cj$. Then by Stirling’s formula the second term in (10.10) dominates and the scaling of ${h_{\varphi _{n}}^{w}}(f_{n})$ is ${({n}^{2}\log n)}^{-1}$ instead of ${n}^{-2}$ as $n\to \infty $.
Theorem 10.1. can be considered as an analog of the SMB theorem for the primary WE rate in the case of an AA WF. A specification of the secondary rate $\mathrm{A}_{1}$ is given in Theorem 10.3 for an additive WF. The WE rates for multiplicative WFs are studied in Theorem 10.4 for the case where X is a stationary ergodic Markov chain on $\mathcal{X}$.
Theorem 10.3.
Suppose that $\varphi _{n}({\mathbf{x}_{0}^{n-1}})={\sum _{j=0}^{n-1}}\psi (x_{j})$. Let X be a stationary RP with the property that $\forall \hspace{2.5pt}i\in \mathbb{Z}$ there exists the limit
(10.11)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \underset{n\to \infty }{\lim }\sum \limits_{j\in \mathbb{Z}:\hspace{0.1667em}|j+i|\le n}\mathbb{E}\big[\psi (X_{0})\log {p}^{(n+i+j)}\big(X_{j}|{\mathbf{X}_{-n-i}^{j-1}}\big)\big]\\{} & \displaystyle \hspace{1em}=\sum \limits_{j\in \mathbb{Z}}\mathbb{E}\big[\psi (X_{0})\log p\big(X_{j}|{\mathbf{X}_{-\infty }^{j-1}}\big)\big]:=-\mathrm{A}_{1}\end{array}\]
and the last series converges absolutely. Then $\lim _{n\to \infty }\frac{1}{n}{H_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=\mathrm{A}_{1}$.

10.B WI and WE rates for asymptotically multiplicative WFs

The WI rate is given in Theorem 10.3. Here we use the condition of asymptotic multiplicativity:
(10.12)
\[\underset{n\to \infty }{\lim }{\big[\varphi _{n}\big({\mathbf{X}_{0}^{n-1}}\big)\big]}^{1/n}=\beta ,\hspace{1em}\mathbb{P}\text{-a.s.}\]
Theorem 10.4.
Given an ergodic RP X with a probability distribution $\mathbb{P}$, consider the WI ${I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{x}_{0}^{n-1}})=-\varphi _{n}({\mathbf{x}_{0}^{n-1}})\log \mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})$. Suppose that convergence in (10.12) holds $\mathbb{P}$-a.s. Then the following limit holds true:
\[\underset{n\to \infty }{\lim }\frac{1}{n}\log {I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{X}_{0}^{n-1}}\big)=\beta ,\hspace{1em}\mathbb{P}\textit{-a.s.}\]
Theorem 10.5.
Assume that $\varphi ({\mathbf{x}_{0}^{n-1}})={\prod _{j=0}^{n-1}}\psi (x_{j})$, with $\psi (x)>0$, $x\in \mathcal{X}$. Let X be a stationary Markov chain with transition probabilities $p(x,y)>0$, $x,y\in \mathcal{X}$. Then, for all initial distribution λ,
(10.13)
\[\underset{n\to \infty }{\lim }\frac{1}{n}\log {h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n},\lambda )=\mathrm{B}_{0}.\]
Here
(10.14)
\[\mathrm{B}_{0}=\log \mu \]
and $\mu >0$ is the Perron–Frobenius eigenvalue of the matrix $\mathtt{M}=(\psi (x)p(x,y))$ coinciding with the norm of $\mathtt{M}$.
The secondary rate $\mathrm{B}_{1}$ in this case is identified through the invariant probabilities $\pi (x)$ of the Markov chain and the Perron–Frobenius eigenvectors of matrices $\mathtt{M}$ and ${\mathtt{M}}^{\mathrm{T}}$.
Example 10.6.
Consider a stationary sequence $X_{n+1}=\alpha X_{n}+Z_{n+1},n\ge 0$, where $Z_{n+1}\sim \mathrm{N}(0,{\sigma }^{2})$ are independent, and $X_{0}\sim \mathrm{N}(0,c)$, $c=\frac{1}{1-{\alpha }^{2}}$. Then
(10.15)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {h_{\varphi _{n}}^{w}}(f_{n})& \displaystyle =\frac{1}{2}\mathbb{E}\Bigg[\prod \limits_{j=0}^{n-1}\psi (X_{j})\bigg({X_{0}^{2}}-2\alpha X_{0}X_{1}\\{} & \displaystyle \hspace{1em}+\big(1+{\alpha }^{2}\big){X_{1}^{2}}-2\alpha X_{1}X_{2}+\big(1+{\alpha }^{2}\big){X_{2}^{2}}-\cdots \\{} & \displaystyle \hspace{1em}+\big(1+{\alpha }^{2}\big){X_{n-2}^{2}}-2\alpha X_{n-2}X_{n-1}+{X_{n-1}^{2}}-2\log \bigg(\frac{\sqrt{1-{\alpha }^{2}}}{{(2\pi )}^{n/2}}\bigg)\bigg)\Bigg].\end{array}\]
Conditions of Theorem 10.5 may be checked under some restrictions on the WF ψ, see [18].

Acknowledgments

The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) and supported within the subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the Global Competitiveness Programme.

Footnotes

1 AMCTA-2017, Talks at Conference “Analytical and Computational Methods in Probability Theory and its Applications”

References

[1] 
Cover, T., Thomas, J.: Elements of Information Theory. John Wiley, New York (2006). MR2239987
[2] 
Dembo, A.: Simple proof of the concavity of the entropy power with respect to added Gaussian noise. IEEE Trans. Inf. Theory 35(4), 887–888 (1989). MR1013698. doi:10.1109/18.32166
[3] 
Dembo, A., Cover, T., Thomas, J.: Information theoretic inequalities. IEEE Trans. Inf. Theory 37, 1501–1518 (1991). MR1134291. doi:10.1109/18.104312
[4] 
Frizelle, G., Suhov, Y.: An entropic measurement of queueing behaviour in a class of manufacturing operations. Proc. R. Soc. Lond., Ser A 457, 1579–1601 (2001). doi:10.1098/rspa.2000.0731
[5] 
Frizelle, G., Suhov, Y.: The measurement of complexity in production and other commercial systems. Proc. R. Soc. Lond., Ser A 464, 2649–2668 (2008). doi:10.1098/rspa.2007.0275
[6] 
Guiasu, S.: Weighted entropy. Rep. Math. Phys. 2, 165–179 (1971). MR0289206. doi:10.1016/0034-4877(71)90002-4
[7] 
Kelbert, M., Suhov, Y.: Information Theory and Coding by Example. Cambridge University Press, Cambridge (2013). MR3137525. doi:10.1017/CBO9781139028448
[8] 
Kelbert, M., Suhov, Y.: Continuity of mutual entropy in the limiting signal-to-noise ratio regimes. In: Stochastic Analysis, pp. 281–299. Springer, Berlin (2010). MR2789089. doi:10.1007/978-3-642-15358-7_14
[9] 
Kelbert, M., Stuhl, I., Suhov, Y.: Weighted entropy and optimal portfolios for risk-averse Kelly investments. Aequationes Mathematicae 91 (2017). in press
[10] 
Kelbert, M., Stuhl, I., Suhov, Y.: Weighted entropy and its use in Computer Science and beyond. Lecture Notes in Computer Science. (in press)
[11] 
Khan, J.F., Bhuiyan, S.M.: Weighted entropy for segmentation evaluation. Opt. Laser Technol. 57, 236–242 (2014). doi:10.1016/j.optlastec.2013.07.012
[12] 
Lai, W.K., Khan, I.M., Poh, G.S.: Weighted entropy-based measure for image segmentation. Proc. Eng. 41, 1261–1267 (2012). doi:10.1016/j.proeng.2012.07.309
[13] 
Lieb, E.: Proof of entropy conjecture of Wehrl. Commun. Math. Phys. 62, 35–41 (1978). MR0506364. doi:10.1007/BF01940328
[14] 
Nawrockia, D.N., Harding, W.H.: State-value weighted entropy as a measure of investment risk. Appl. Econ. 18, 411–419 (1986). doi:10.1080/00036848600000038
[15] 
Paksakis, C., Mermigas, S., Pirourias, S., Chondrokoukis, G.: The role of weighted entropy in security quantification. Int. Journ. Inf. Electron. Eng. 3(2), 156–159 (2013)
[16] 
Rioul, O.: Information theoretic proofs of entropy power inequality. IEEE Trans. Inf. Theory 57(1), 33–55 (2011). MR2810269. doi:10.1109/TIT.2010.2090193
[17] 
Shockley, K.R.: Using weighted entropy to rank chemicals in quantitative high throughput screening experiments. J. Biomol. Screen. 19, 344–353 (2014). doi:10.1177/1087057113505325
[18] 
Suhov, Y.: Stuhl I. Weighted information and entropy rates (2016). arXiv:1612.09169v1
[19] 
Suhov, Y., Sekeh, S.: An extension of the Ky-Fan inequality. arXiv:1504.01166
[20] 
Suhov, Y., Stuhl, I., Sekeh, S., Kelbert, M.: Basic inequalities for weighted entropy. Aequ. Math. 90(4), 817–848 (2016). MR3523101. doi:10.1007/s00010-015-0396-5
[21] 
Suhov, Y., Sekeh, S., Kelbert, M.: Entropy-power inequality for weighted entropy. arXiv:1502.02188
[22] 
Suhov, Y., Yasaei Sekeh, S.: Stuhl I. Weighted Gaussian entropy and determinant inequalities. arXiv:1505.01753v1
[23] 
Tsui, P.-H.: Ultrasound detection of scatterer concentration by weighted entropy. Entropy 17, 6598–6616 (2015). doi:10.3390/e17106598
[24] 
Verdú, S., Guo, D.: A simple proof of the entropy-power inequality. IEEE Trans. Inf. Theory 52(5), 2165–2166 (2006). MR2234471. doi:10.1109/TIT.2006.872978
[25] 
Villani, C.: A short proof of the “concavity of entropy power”. IEEE Trans. Inf. Theory 46, 1695–1696 (2000). MR1768665. doi:10.1109/18.850718
[26] 
Yang, L., Yang, J., Peng, N., Ling, J.: Weighted information entropy: A method for estimating the complex degree of infrared images’ backgrounds. In: Kamel, M., Campilho, A. (eds.) Image Analysis and Recognition, vol. 3656, Springer, Berlin/Heidelberg, pp. 215–22 (2005). MR3157460. doi:10.1007/978-3-642-39094-4
[27] 
Zamir, R.: A proof of the Fisher information inequality via a data processing argument. IEEE Trans. Inf. Theory 44(3), 1246–1250 (1998). MR1616672. doi:10.1109/18.669301
Reading mode PDF XML

Table of contents
  • 1 Introduction
  • 2 The weighted Gibbs inequality
  • 3 Concavity/convexity of the weighted entropy
  • 4 Weighted Ky-Fan and Hadamard inequalities
  • 5 A weighted Fisher information matrix
  • 6 Weighted entropy power inequality
  • 7 The WLSI for the WF close to a constant
  • 8 A weighted Fisher information inequality
  • 9 The weighted entropy power is a concave function
  • 10 Rates of weighted entropy and information
  • Acknowledgments
  • Footnotes
  • References

Copyright
© 2017 The Author(s). Published by VTeX
by logo by logo
Open access article under the CC BY license.

Keywords
Weighted entropy Gibbs inequality Ky-Fan inequality Fisher information inequality entropy power inequality Lieb’s splitting inequality rates of weighted entropy and information

MSC2010
94A17

Metrics
since March 2018
971

Article info
views

1094

Full article
views

527

PDF
downloads

162

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

  • Theorems
    15
Theorem 2.1.
Theorem 3.1.
Theorem 4.1.
Theorem 4.2.
Theorem 4.3.
Theorem 5.1 (Connection between WFIM and weighted KL-divergence measures).
Theorem 6.1.
Theorem 7.2.
Theorem 8.1 (A weighted Fisher information inequality (WFII)).
Theorem 9.1 (A weighted De Bruijn’s identity).
Theorem 9.2.
Theorem 10.1.
Theorem 10.3.
Theorem 10.4.
Theorem 10.5.
Theorem 2.1.
Suppose that
(2.2)
\[\int _{\mathcal{X}}\varphi (\mathbf{x})\big[f(\mathbf{x})-g(\mathbf{x})\big]\mathrm{d}\mu (\mathbf{x})\ge 0.\]
Then ${D_{\varphi }^{\mathrm{w}}}(f\| g)\ge 0$. Moreover, ${D_{\varphi }^{\mathrm{w}}}(f\| g)=0$ iff $\varphi (\mathbf{x})[\frac{g(\mathbf{x})}{f(\mathbf{x})}-1]=0$ f-a.e.
Theorem 3.1.
(a) The WE/WDE functional $f\mapsto {h_{\varphi }^{\mathrm{w}}}(f)$ is concave in argument f. Namely, for any PM/DFs $f_{1}(x)$, $f_{2}(x)$ and $\lambda _{1},\lambda _{2}\in [0,1]$ such that $\lambda _{1}+\lambda _{2}=1$,
(3.1)
\[{h_{\varphi }^{\mathrm{w}}}(\lambda _{1}f_{1}+\lambda _{2}f_{2})\ge \lambda _{1}{h_{\varphi }^{\mathrm{w}}}(f_{1})+\lambda _{2}{h_{\varphi }^{\mathrm{w}}}(f_{2}).\]
The equality iff $\varphi (x)[f_{1}(x)-f_{2}(x)]=0$ holds for $(\lambda _{1}f_{1}+\lambda _{2}f_{2})$-a.a. x.
(b) However, the RWE functional $(f,g)\mapsto {D_{\varphi }^{\mathrm{w}}}(f\| g)$ is convex: given two pairs of PDFs $(f_{1},f_{2})$ and $(g_{1},g_{2})$,
(3.2)
\[\lambda _{1}{D_{\varphi }^{\mathrm{w}}}(f_{1}\| g_{1})+\lambda _{2}{D_{\varphi }^{\mathrm{w}}}(f_{2}\| g_{2})\ge {D_{\varphi }^{\mathrm{w}}}(\lambda _{1}f_{1}+\lambda _{2}f_{2}\| \lambda _{1}g_{1}+\lambda _{2}g_{2}),\]
with equality iff $\lambda _{1}\lambda _{2}=0$ or $\varphi (x)[f_{1}(x)-f_{2}(x)]=\varphi (x)[g_{1}(x)-g_{2}(x)]=0$ μ-a.e.
Theorem 4.1.
Given positive-definite matrices $C_{1},C_{2}$ and $\lambda _{1},\lambda _{2}\in [0,1]$ with $\lambda _{1}+\lambda _{2}=1$, set $C=\lambda _{1}C_{1}+\lambda _{2}C_{2}$. Assume $\mathbf{t}\in \mathcal{S}$. Then
(4.4)
\[h\big({f_{C}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg)-h\big({f_{C_{1}}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{1}\mathbf{t}\bigg)-h\big({f_{C_{2}}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{2}\mathbf{t}\bigg)\ge 0,\]
with equality iff $\lambda _{1}\lambda _{2}=0$ or $C_{1}=C_{2}$.
Theorem 4.2.
Let $f(\mathbf{x})$ be a PDF on ${\mathbb{R}}^{d}$ with mean $\mathbf{0}$ and $(d\times d)$ covariance matrix C. Let ${f}^{\mathrm{No}}(\mathbf{x})$ stand for the Gaussian PDF, again with the mean $\mathbf{0}$ and covariance matrix C. Define $(d\times d)$ matrices
(4.5)
\[\boldsymbol{\varPhi }=\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\mathbf{x}{\mathbf{x}}^{\mathrm{T}}f(\mathbf{x})\mathrm{d}\mathbf{x},\hspace{2em}{\boldsymbol{\varPhi }_{C}^{\mathrm{No}}}=\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\mathbf{x}{\mathbf{x}}^{\mathrm{T}}{f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x}.\]
Cf. (1.4). Assume that
\[\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\big[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})\big]\mathrm{d}\mathbf{x}\ge 0\]
and
\[\log \big[{(2\pi )}^{d}(\mathrm{det}\hspace{0.1667em}C)\big]\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\big[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})\big]\mathrm{d}\mathbf{x}+\mathrm{tr}\big[{C}^{-1}\big({\boldsymbol{\varPhi }_{C}^{\mathrm{No}}}-\boldsymbol{\varPhi }\big)\big]\le 0.\]
Then ${h_{\varphi }^{\mathrm{w}}}(f)\le {h_{\varphi }^{\mathrm{w}}}({f_{C}^{\mathrm{No}}})$, with equality iff $\varphi (\mathbf{x})[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})]=0$ a.e.
Theorem 4.3.
Assume that
\[\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\Bigg[{f_{C}^{\mathrm{No}}}(\mathbf{x})-\prod \limits_{j=1}^{d}{f_{C_{\mathit{jj}}}^{\mathrm{No}}}(x_{j})\Bigg]\mathrm{d}\mathbf{x}\ge 0.\]
Then, with the matrix $\boldsymbol{\varPhi }=(\varPhi _{\mathit{ij}})$ as in (4.5),
\[\alpha \log \prod \limits_{j=1}^{d}(2\pi C_{\mathit{jj}})\hspace{0.1667em}+\hspace{0.1667em}(\log \mathrm{e})\sum \limits_{j=1}^{d}{C_{\mathit{jj}}^{-1}}\varPhi _{\mathit{jj}}\hspace{0.1667em}-\hspace{0.1667em}\alpha \log \big[{(2\pi )}^{d}(\mathrm{det}\hspace{0.1667em}C)\big]\hspace{0.1667em}-\hspace{0.1667em}(\log \mathrm{e})\mathrm{tr}\hspace{0.1667em}{C}^{-1}\boldsymbol{\varPhi }\hspace{0.1667em}\ge \hspace{0.1667em}0.\]
Theorem 5.1 (Connection between WFIM and weighted KL-divergence measures).
For smooth families $\{f_{\theta },\theta \in \varTheta \in {\mathbb{R}}^{1}\}$ and a given WF φ, we get
(5.2)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {D_{\varphi }^{\mathrm{w}}}(f_{\theta _{1}}\| f_{\theta _{2}})& \displaystyle =\frac{1}{2}{J_{\varphi }^{\mathrm{w}}}(X,\theta _{1}){(\theta _{2}-\theta _{1})}^{2}+\mathbb{E}_{\theta _{1}}\big[\varphi (X)D_{\theta }\log f_{\theta _{1}}(X)\big](\theta _{1}-\theta _{2})\\{} & \displaystyle \hspace{1em}-\frac{1}{2}\mathbb{E}_{\theta _{1}}\bigg[\varphi (X)\frac{{D_{\theta }^{2}}f_{\theta _{1}}(X)}{f_{\theta _{1}}(X)}\bigg]{(\theta _{2}-\theta _{1})}^{2}+o\big(|\theta _{1}-\theta _{2}{|}^{2}\big)\mathbb{E}_{\theta _{1}}\big[\varphi (X)\big]\end{array}\]
where $D_{\theta }$ stands for $\frac{\partial }{\partial \theta }$.
Theorem 6.1.
Given independent RVs $X_{1},X_{2}\in {\mathbb{R}}^{1}$ with PDFs $f_{1},f_{2}$, and the weight function φ, set $X=X_{1}+X_{2}$. Assume the following conditions:
(i)
(6.7)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbb{E}\varphi (X_{i})& \displaystyle \ge \mathbb{E}\varphi (X)\hspace{1em}\textit{if}\hspace{2.5pt}\kappa \ge 1,\hspace{2.5pt}i=1,2,\\{} \displaystyle \mathbb{E}\varphi (X_{i})& \displaystyle \le \mathbb{E}\varphi (X)\hspace{1em}\textit{if}\hspace{2.5pt}\kappa \le 1,\hspace{2.5pt}i=1,2.\end{array}\]
(ii) With $Y_{1},Y_{2}$ and α as defined in (6.6),
(6.8)
\[{(\cos \alpha )}^{2}{h_{\varphi _{c}}^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\varphi _{s}}^{\mathrm{w}}}(Y_{2})\le {h_{\varphi }^{\mathrm{w}}}(X),\]
where $\varphi _{c}(x)=\varphi (x\cos \alpha )$, $\varphi _{s}(x)=\varphi (x\sin \alpha )$ and
(6.9)
\[{h_{\varphi _{c}}^{\mathrm{w}}}(Y_{1})=-\mathbb{E}\big[\varphi _{c}(Y_{1})\log \big(f_{Y_{1}}(Y_{1})\big)\big],\hspace{2em}{h_{\varphi _{s}}^{\mathrm{w}}}(Y_{2})=-\mathbb{E}\big[\varphi _{s}(Y_{2})\log \big(f_{Y_{2}}(Y_{2})\big)\big].\]
Then the WEPI holds.
Theorem 7.2.
Let $d=1$ and assume conditions (7.4), (7.5). Let $\gamma _{0}$ be a point of continuity of $M(Z;\gamma ),Z=Y_{1},Y_{2},X_{1}+X_{2}$. Suppose that there exists $\delta >0$ such that
(7.6)
\[M(X_{1}+X_{2};\gamma _{0})\ge M(Y_{1},\gamma _{0}){(\cos \alpha )}^{2}+M(Y_{2};\gamma _{0}){(\sin \alpha )}^{2}+\delta .\]
Suppose that for some $\bar{\varphi }>0$ the WF satisfies
(7.7)
\[\big|\varphi (x)-\bar{\varphi }\big|<\epsilon .\]
Then there exists $\epsilon _{0}=\epsilon _{0}(\gamma _{0},\delta ,f_{1},f_{2})$ such that for any WF satisfying (7.7) with $\epsilon <\epsilon _{0}$ the WLSI holds true.
Theorem 8.1 (A weighted Fisher information inequality (WFII)).
Let X and Y be independent RVs. Assume that ${f_{\mathbf{X}}^{(1)}}=\frac{\partial }{\partial \underline{\theta }}f_{1}$ is not a multiple of ${f_{\mathbf{Y}}^{(1)}}=\frac{\partial }{\partial \underline{\theta }}f_{2}$. Then
(8.5)
\[{J_{\varphi }^{\mathrm{w}}}(\mathbf{X}+\mathbf{Y})\le (\mathbf{I}-M_{\varphi }G_{\varphi }){\big[{\big({J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}-\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})\big]}^{-1}.\]
Theorem 9.1 (A weighted De Bruijn’s identity).
Let $\mathbf{X}\sim f_{\mathbf{X}}$ be a RV in ${\mathbb{R}}^{n}$, with a PDF $f_{\mathbf{X}}\in {C}^{2}$. For a standard Gaussian RV $\mathbf{N}\sim \mathrm{N}(\mathbf{0},\mathbf{I}_{d})$ independent of X, and given $\gamma >0$, define the RV $\mathbf{Z}=\mathbf{X}+\sqrt{\gamma }\mathbf{N}$ with PDF $f_{\mathbf{Z}}$. Let $\mathbf{V}_{r}$ be the d-sphere of radius r centered at the origin and having surface denoted by $\mathbf{S}_{r}$. Assume that for given WF φ and $\forall \gamma \in (0,1)$ the relations
(9.6)
\[\int f_{\mathbf{Z}}(\mathbf{x})\big|\ln f_{\mathbf{Z}}(\mathbf{x})\big|\mathrm{d}\mathbf{x}<\infty ,\hspace{2em}\int \big|\nabla \hspace{0.2778em}f_{\mathbf{Z}}(\mathbf{y})\ln f_{\mathbf{Z}}(\mathbf{y})\big|\mathrm{d}\mathbf{y}<\infty \]
and
(9.7)
\[\underset{r\to \infty }{\lim }\int _{\mathbb{S}_{r}}\varphi (\mathbf{y})\log f_{Z}(\mathbf{y})\big(\nabla f_{Z}(\mathbf{y})\big)\mathrm{d}S_{r}=0\]
are fulfilled. Then
(9.8)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})=\frac{1}{2}\hspace{0.2778em}\mathrm{tr}\hspace{0.2778em}{\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})-\frac{1}{2}\mathbb{E}\bigg[\varphi \hspace{0.2778em}\frac{\Delta f_{Z}(\mathbf{Z})}{f_{Z}(\mathbf{Z})}\bigg]+\frac{\mathcal{R}(\gamma )}{2}.\]
Here
(9.9)
\[\mathcal{R}(\gamma )=\mathbb{E}\big[\nabla \varphi \hspace{0.2778em}\log f_{\mathbf{Z}}(\mathbf{Z}){\big(\nabla \log f_{\mathbf{Z}}(\mathbf{Z})\big)}^{\mathrm{T}}\big].\]
If we assume that $\varphi \equiv 1$, then the equality (9.8) directly implies (9.4). Hence, the standard entropy power is a concave function of γ.
Theorem 9.2.
Assume conditions (9.6) and (9.7) and suppose that $\forall \gamma \in (0,1)$
(9.10)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{d}{\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})}\ge 1+\epsilon .\]
Then $\exists \delta =\delta (\epsilon )$ such that any WF φ for which $\exists \bar{\varphi }>0$: $|\varphi -\bar{\varphi }|<\delta $, $|\nabla \hspace{0.2778em}\varphi |<\delta $ the WEP (9.1) is a concave function of γ. Under the milder assumption
(9.11)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{d}{\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})}\bigg|_{\gamma =0}\ge 1+\epsilon ,\]
the WEP is a concave function of γ in a small neighbourhood of $\gamma =0$.
Theorem 10.1.
Given an ergodic RP X, consider the WI ${I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{X}_{0}^{n-1}})$ and the WE ${H_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})$ as defined in (10.2), (10.3). Suppose that convergence in (10.7) holds $\mathbb{P}$-a.s. Then:
(I) We have that
(10.8)
\[\underset{n\to \infty }{\lim }\frac{{I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{X}_{0}^{n-1}})}{{n}^{2}}=\alpha S,\hspace{1em}\mathbb{P}\textit{-a.s.}\]
(II) Furthermore,
  • (a) suppose that the WFs $\varphi _{n}$ exhibit convergence (10.7), $\mathbb{P}$-a.s., with a finite α, and $|\varphi _{n}({\mathbf{X}_{0}^{n-1}})/n|\le c$ where c is a constant independent of n. Suppose also that convergence in Eqn (10.6) holds true. Then we have that
    (10.9)
    \[\underset{n\to \infty }{\lim }\frac{{h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})}{{n}^{2}}=\alpha S.\]
  • (b) Likewise, convergence in Eqn (10.9) holds true whenever convergences (10.7) and (10.6) hold $\mathbb{P}$-a.s. and $|\log \mathbf{p}_{n}({\mathbf{X}_{0}^{n-1}})/n|\le c$ where c is a constant.
  • (c) Finally, suppose that convergence in (10.6) and (10.7) holds in $\mathrm{L}_{2}(\mathbb{P})$, with finite α and S. Then again, convergence in (10.9) holds true.
Theorem 10.3.
Suppose that $\varphi _{n}({\mathbf{x}_{0}^{n-1}})={\sum _{j=0}^{n-1}}\psi (x_{j})$. Let X be a stationary RP with the property that $\forall \hspace{2.5pt}i\in \mathbb{Z}$ there exists the limit
(10.11)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \underset{n\to \infty }{\lim }\sum \limits_{j\in \mathbb{Z}:\hspace{0.1667em}|j+i|\le n}\mathbb{E}\big[\psi (X_{0})\log {p}^{(n+i+j)}\big(X_{j}|{\mathbf{X}_{-n-i}^{j-1}}\big)\big]\\{} & \displaystyle \hspace{1em}=\sum \limits_{j\in \mathbb{Z}}\mathbb{E}\big[\psi (X_{0})\log p\big(X_{j}|{\mathbf{X}_{-\infty }^{j-1}}\big)\big]:=-\mathrm{A}_{1}\end{array}\]
and the last series converges absolutely. Then $\lim _{n\to \infty }\frac{1}{n}{H_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=\mathrm{A}_{1}$.
Theorem 10.4.
Given an ergodic RP X with a probability distribution $\mathbb{P}$, consider the WI ${I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{x}_{0}^{n-1}})=-\varphi _{n}({\mathbf{x}_{0}^{n-1}})\log \mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})$. Suppose that convergence in (10.12) holds $\mathbb{P}$-a.s. Then the following limit holds true:
\[\underset{n\to \infty }{\lim }\frac{1}{n}\log {I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{X}_{0}^{n-1}}\big)=\beta ,\hspace{1em}\mathbb{P}\textit{-a.s.}\]
Theorem 10.5.
Assume that $\varphi ({\mathbf{x}_{0}^{n-1}})={\prod _{j=0}^{n-1}}\psi (x_{j})$, with $\psi (x)>0$, $x\in \mathcal{X}$. Let X be a stationary Markov chain with transition probabilities $p(x,y)>0$, $x,y\in \mathcal{X}$. Then, for all initial distribution λ,
(10.13)
\[\underset{n\to \infty }{\lim }\frac{1}{n}\log {h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n},\lambda )=\mathrm{B}_{0}.\]
Here
(10.14)
\[\mathrm{B}_{0}=\log \mu \]
and $\mu >0$ is the Perron–Frobenius eigenvalue of the matrix $\mathtt{M}=(\psi (x)p(x,y))$ coinciding with the norm of $\mathtt{M}$.

MSTA

MSTA

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy