1 Introduction
This paper represents an extended version of an earlier note [10].1 We also follow earlier publications discussing related topics: [20, 21, 19, 18]. The Shannon entropy (SE) of a probability distribution p or the Shannon differential entropy (SDE) of a probability density function (PDF) f
is context-free, i.e., does not depend on the nature of outcomes $x_{i}$ or x, but only upon probabilities $p(x_{i})$ or values $f(x)$. It gives the notion of entropy a great flexibility which explains its successful applications. However, in many situations it seems insufficient, and the context-free property appears as a drawback. Viz., suppose you learn a news about severe weather conditions in an area far away from your place. Such conditions usually do not happen; an event like this has a small probability $p\ll 1$ and conveys a high information $-\log p$. At the same time you hear that a tree near your parking lot in the town has fallen and damaged a number of cars. The probability of this event is also low, so the amount of information is again high. However, the value of this information for you is higher than in the first event. Considerations of this character can motivate a study of weighted information and entropy, making them context-dependent.
(1.1)
\[h(\mathbf{p})=-\sum \limits_{i}p(x_{i})\log p(x_{i}),\hspace{2em}h(f)=-\int f(x)\log f(x)\mathrm{d}x\]Definition 1.1.
Let us define the weighted entropy (WE) as
Here a non-negative weight function (WF) $x_{i}\mapsto \varphi (x_{i})$ is introduced, representing a value/utility of an outcomes $x_{i}$. A similar approach can be used for the differential entropy of a probability density function (PDF) f. Define the weighted differential entropy (WDE) as
(1.2)
\[{h_{\varphi }^{\mathrm{w}}}(\mathbf{p})=-\sum \limits_{i}\varphi (x_{i})p(x_{i})\log p(x_{i}).\]An initial example of a WF φ may be $\varphi (\mathbf{x})=\mathbf{1}$ ($\mathbf{x}\in A$) where A is a particular subset of outcomes (an event). A heuristic use of the WE with such a WF was demonstrated in [4, 5]. Another example repeatedly used below is $f(\mathbf{x})={f_{C}^{\mathrm{No}}}(\mathbf{x})$, a d-dimensional Gaussian PDF with mean $\mathbf{0}$ and covariance matrix C. Here
For $\varphi (\mathbf{x})=1$ we get the normal SDE $h({f_{C}^{\mathrm{No}}})=\frac{1}{2}\log [{(2\pi \mathrm{e})}^{d}\mathrm{det}\hspace{0.1667em}C]$.
(1.4)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {h_{\varphi }^{\mathrm{w}}}\big({f_{C}^{\mathrm{No}}}\big)& \displaystyle =\frac{\alpha _{\varphi }(C)}{2}\log \big[{(2\pi )}^{d}\mathrm{det}(C)\big]+\frac{\log e}{2}\mathrm{tr}\big[{C}^{-1}\varPhi _{C,\varphi }\big]\hspace{1em}\text{where}\\{} \displaystyle \alpha _{\varphi }(C)& \displaystyle =\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x}){f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x},\hspace{1em}\varPhi _{C,\varphi }=\int _{{\mathbb{R}}^{d}}\mathbf{x}{\mathbf{x}}^{\mathrm{T}}\varphi (\mathbf{x}){f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x}.\end{array}\]In this note we give a brief introduction into the concept of the weighted entropy. We do not always give proofs, referring the reader to the quoted original papers. Some basic properties of WE and WDE have been presented in [20]; see also references therein to early works on the subject. Applications of the WE and WDE to the security quantification of information systems are discussed in [15]. Other domains range from the stock market to the image processing, see, e.g., [6, 9, 12, 14, 23, 26].
Throughout this note we assume that the series and integrals in (1.2)–(1.3) and the subsequent equations converge absolutely, without stressing it every time again. To unify the presentation, we will often use integrals $\int _{\mathcal{X}}\mathrm{d}\mu $ relative to a reference σ-finite measure μ on a Polish space $\mathcal{X}$ with a Borel σ-algebra $\mathfrak{X}$. In this regard, the acronym PM/DF (probability mass/density function) will be employed. Usual measurability assumptions will also be in place for the rest of the presentation. We also assume that the WF $\varphi >0$ on an open set in $\mathcal{X}$.
In some parts of the presentation, the sums and integrals comprising a PM/DF will be written as expectations: this will make it easier to explain/use assumptions and properties involved. Viz., Eqns (1.2)–(1.3) can be given as ${h_{\varphi }^{\mathrm{w}}}(\mathbf{p})=-\mathbb{E}\varphi (\mathbf{X})\log \mathbf{p}(\mathbf{X})$ and ${h_{\varphi }^{\mathrm{w}}}(f)=-\mathbb{E}\varphi (\mathbf{X})\log f(\mathbf{X})$ where X is a random variable (RV) with the PM/DF p or f. Similarly, in (1.4), $\alpha _{\varphi }(C)=\mathbb{E}\varphi (\mathbf{X})$ and $\boldsymbol{\varPhi }_{C,\varphi }=\mathbb{E}\varphi (\mathbf{X})\mathbf{X}{\mathbf{X}}^{\mathrm{T}}$ where $\mathbf{X}\sim \mathrm{N}(\mathbf{0},C)$.
2 The weighted Gibbs inequality
Given two non-negative functions f, g (typically, PM/DFs), define the weighted Kullback-Leibler divergence (or the relative WE, briefly RWE) as
Theorem 1.3 from [20] states:
(2.1)
\[{D_{\varphi }^{\mathrm{w}}}(f\| g)=\int _{\mathcal{X}}\varphi (\mathbf{x})f(\mathbf{x})\log \frac{f(\mathbf{x})}{g(\mathbf{x})}\mathrm{d}\mu (\mathbf{x}).\]Theorem 2.1.
Suppose that
Then ${D_{\varphi }^{\mathrm{w}}}(f\| g)\ge 0$. Moreover, ${D_{\varphi }^{\mathrm{w}}}(f\| g)=0$ iff $\varphi (\mathbf{x})[\frac{g(\mathbf{x})}{f(\mathbf{x})}-1]=0$ f-a.e.
(2.2)
\[\int _{\mathcal{X}}\varphi (\mathbf{x})\big[f(\mathbf{x})-g(\mathbf{x})\big]\mathrm{d}\mu (\mathbf{x})\ge 0.\]Example 2.2.
For an exponential family in the canonical form
with the sufficient statistics $T(\mathbf{x})$ we have
where ∇ stands for the gradient w.r.t. to the parameter vector $\underline{\theta }$, and
(2.3)
\[f_{\underline{\theta }}(\mathbf{x})=h(\mathbf{x})\exp \big(\big\langle \underline{\theta },T(\mathbf{x})\big\rangle -A(\underline{\theta })\big),\hspace{1em}\mathbf{x}\in {\mathbb{R}}^{d},\hspace{2.5pt}\underline{\theta }\in {\mathbb{R}}^{m},\](2.4)
\[{D_{\varphi }^{\mathrm{w}}}(f_{\underline{\theta }_{1}}\| f_{\underline{\theta }_{2}})={e}^{A_{\varphi }(\underline{\theta }_{1})-A(\underline{\theta }_{1})}\big(A(\underline{\theta }_{2})-A(\underline{\theta }_{1})-\big\langle \nabla A_{\varphi }(\underline{\theta }_{1}),\underline{\theta }_{2}-\underline{\theta }_{1}\big\rangle \big),\]3 Concavity/convexity of the weighted entropy
Theorems 2.1 and 2.2 from [20] offer the following assertion:
Theorem 3.1.
(a) The WE/WDE functional $f\mapsto {h_{\varphi }^{\mathrm{w}}}(f)$ is concave in argument f. Namely, for any PM/DFs $f_{1}(x)$, $f_{2}(x)$ and $\lambda _{1},\lambda _{2}\in [0,1]$ such that $\lambda _{1}+\lambda _{2}=1$,
The equality iff $\varphi (x)[f_{1}(x)-f_{2}(x)]=0$ holds for $(\lambda _{1}f_{1}+\lambda _{2}f_{2})$-a.a. x.
(3.1)
\[{h_{\varphi }^{\mathrm{w}}}(\lambda _{1}f_{1}+\lambda _{2}f_{2})\ge \lambda _{1}{h_{\varphi }^{\mathrm{w}}}(f_{1})+\lambda _{2}{h_{\varphi }^{\mathrm{w}}}(f_{2}).\](b) However, the RWE functional $(f,g)\mapsto {D_{\varphi }^{\mathrm{w}}}(f\| g)$ is convex: given two pairs of PDFs $(f_{1},f_{2})$ and $(g_{1},g_{2})$,
with equality iff $\lambda _{1}\lambda _{2}=0$ or $\varphi (x)[f_{1}(x)-f_{2}(x)]=\varphi (x)[g_{1}(x)-g_{2}(x)]=0$ μ-a.e.
(3.2)
\[\lambda _{1}{D_{\varphi }^{\mathrm{w}}}(f_{1}\| g_{1})+\lambda _{2}{D_{\varphi }^{\mathrm{w}}}(f_{2}\| g_{2})\ge {D_{\varphi }^{\mathrm{w}}}(\lambda _{1}f_{1}+\lambda _{2}f_{2}\| \lambda _{1}g_{1}+\lambda _{2}g_{2}),\]4 Weighted Ky-Fan and Hadamard inequalities
The map $C\mapsto \delta (C):=\log \mathrm{det}(C)$ gives a concave function of a (strictly) positive-definite $(d\times d)$ matrix C: $\delta (C)-\lambda _{1}\delta (C_{1})-\lambda _{2}\delta (C_{2})\ge 0$, where $C=\lambda _{1}C_{1}+\lambda _{2}C_{2}$, $\lambda _{1}+\lambda _{2}=1$ and $\lambda _{1,2}\ge 0$. This is the well-known Ky-Fan inequality. It terms of differential entropies it is equivalent to the bound
and is closely related to a maximising property of the Gaussian differential entropy $h({f_{C}^{\mathrm{No}}})$.
(4.1)
\[h\big({f_{C}^{\mathrm{No}}}\big)-\lambda _{1}h\big({f_{C_{1}}^{\mathrm{No}}}\big)-\lambda _{2}h\big({f_{C_{2}}^{\mathrm{No}}}\big)\ge 0\]Theorem 4.1 below presents one of new bounds of Ky-Fan type, in its most explicit form, for the WF $\varphi (\mathbf{x})=\exp ({\mathbf{x}}^{T}\mathbf{t}),\mathbf{t}\in {\mathbb{R}}^{d}$. Cf. Theorem 3.5 from [20]. In this case the identity ${h_{\varphi }^{\mathrm{w}}}({f}^{\mathrm{No}})=\exp (\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t})h({f}^{\mathrm{No}})$ holds true. Introduce a set
Here functions ${F}^{(1)}$ and ${F}^{(2)}$ incorporate parameters $C_{i}$ and $\lambda _{i}$:
(4.2)
\[\mathcal{S}=\big\{\mathbf{t}\in {\mathbb{R}}^{d}:{F}^{(1)}(\mathbf{t})\ge 0,\hspace{2.5pt}{F}^{(2)}(\mathbf{t})\le 0\big\}.\](4.3)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {F}^{(1)}(\mathbf{t})& \displaystyle =\sum \limits_{i=1}^{2}\lambda _{i}\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{i}\mathbf{t}\bigg)-\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg),\hspace{1em}\mathbf{t}\in {\mathbb{R}}^{d},\\{} \displaystyle {F}^{(2)}(\mathbf{t})& \displaystyle =\Bigg[\sum \limits_{i=1}^{2}\lambda _{i}\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{i}\mathbf{t}\bigg)-\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg)\Bigg]\log \big[{(2\pi )}^{d}\mathrm{det}(C)\big]\\{} & \displaystyle \hspace{1em}+\sum \limits_{i=1}^{2}\lambda _{i}\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{i}\mathbf{t}\bigg)\mathrm{tr}\big[{C}^{-1}C_{i}\big]-d\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg),\hspace{1em}\mathbf{t}\in {\mathbb{R}}^{d}.\end{array}\]Theorem 4.1.
Given positive-definite matrices $C_{1},C_{2}$ and $\lambda _{1},\lambda _{2}\in [0,1]$ with $\lambda _{1}+\lambda _{2}=1$, set $C=\lambda _{1}C_{1}+\lambda _{2}C_{2}$. Assume $\mathbf{t}\in \mathcal{S}$. Then
with equality iff $\lambda _{1}\lambda _{2}=0$ or $C_{1}=C_{2}$.
(4.4)
\[h\big({f_{C}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C\mathbf{t}\bigg)-h\big({f_{C_{1}}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{1}\mathbf{t}\bigg)-h\big({f_{C_{2}}^{\mathrm{No}}}\big)\exp \bigg(\frac{1}{2}{\mathbf{t}}^{T}C_{2}\mathbf{t}\bigg)\ge 0,\]For $\mathbf{t}=\mathbf{0}$ we obtain $\varphi \equiv 1$, and (4.4) coincides with (4.1). Theorem 4.1 is related to the maximisation property of the weighted Gaussian entropy which takes the form of Theorem 4.2. Cf. Example 3.2 in [20].
Theorem 4.2.
Let $f(\mathbf{x})$ be a PDF on ${\mathbb{R}}^{d}$ with mean $\mathbf{0}$ and $(d\times d)$ covariance matrix C. Let ${f}^{\mathrm{No}}(\mathbf{x})$ stand for the Gaussian PDF, again with the mean $\mathbf{0}$ and covariance matrix C. Define $(d\times d)$ matrices
Cf. (1.4). Assume that
(4.5)
\[\boldsymbol{\varPhi }=\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\mathbf{x}{\mathbf{x}}^{\mathrm{T}}f(\mathbf{x})\mathrm{d}\mathbf{x},\hspace{2em}{\boldsymbol{\varPhi }_{C}^{\mathrm{No}}}=\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\mathbf{x}{\mathbf{x}}^{\mathrm{T}}{f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x}.\]
\[\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\big[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})\big]\mathrm{d}\mathbf{x}\ge 0\]
and
\[\log \big[{(2\pi )}^{d}(\mathrm{det}\hspace{0.1667em}C)\big]\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\big[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})\big]\mathrm{d}\mathbf{x}+\mathrm{tr}\big[{C}^{-1}\big({\boldsymbol{\varPhi }_{C}^{\mathrm{No}}}-\boldsymbol{\varPhi }\big)\big]\le 0.\]
Then ${h_{\varphi }^{\mathrm{w}}}(f)\le {h_{\varphi }^{\mathrm{w}}}({f_{C}^{\mathrm{No}}})$, with equality iff $\varphi (\mathbf{x})[f(\mathbf{x})-{f_{C}^{\mathrm{No}}}(\mathbf{x})]=0$ a.e.
Theorems 4.1 and 4.2 are a part of a series of the so-called weighted determinantal inequalities. See [20, 22]. Here we will focus on a weighted version of Hadamard inequality asserting that for a $(d\times d)$ positive-definite matrix $C=(C_{\mathit{ij}})$, $\mathrm{det}\hspace{0.2778em}C\le {\prod _{j=1}^{d}}C_{\mathit{jj}}$ or $\delta (C)\le {\sum _{j=1}^{d}}\log C_{\mathit{jj}}$. Cf. [20], Theorem 3.7. Let ${f_{C_{\mathit{jj}}}^{\mathrm{No}}}$ stand for the Gaussian PDF on $\mathbb{R}$ with the zero mean and the variance $C_{\mathit{jj}}$. Set:
\[\alpha =\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x}){f_{C}^{\mathrm{No}}}(\mathbf{x})\mathrm{d}\mathbf{x}\hspace{2.5pt}\text{(cf. (1.4)).}\]
Theorem 4.3.
Assume that
\[\int _{{\mathbb{R}}^{d}}\varphi (\mathbf{x})\Bigg[{f_{C}^{\mathrm{No}}}(\mathbf{x})-\prod \limits_{j=1}^{d}{f_{C_{\mathit{jj}}}^{\mathrm{No}}}(x_{j})\Bigg]\mathrm{d}\mathbf{x}\ge 0.\]
Then, with the matrix $\boldsymbol{\varPhi }=(\varPhi _{\mathit{ij}})$ as in (4.5),
\[\alpha \log \prod \limits_{j=1}^{d}(2\pi C_{\mathit{jj}})\hspace{0.1667em}+\hspace{0.1667em}(\log \mathrm{e})\sum \limits_{j=1}^{d}{C_{\mathit{jj}}^{-1}}\varPhi _{\mathit{jj}}\hspace{0.1667em}-\hspace{0.1667em}\alpha \log \big[{(2\pi )}^{d}(\mathrm{det}\hspace{0.1667em}C)\big]\hspace{0.1667em}-\hspace{0.1667em}(\log \mathrm{e})\mathrm{tr}\hspace{0.1667em}{C}^{-1}\boldsymbol{\varPhi }\hspace{0.1667em}\ge \hspace{0.1667em}0.\]
5 A weighted Fisher information matrix
Let $\mathbf{X}=(X_{1},\dots ,X_{d})$ be a random $(1\times d)$ vector with PDF $f_{\underline{\theta }}(\mathbf{x})=f_{\mathbf{X}}(\mathbf{x},\underline{\theta })$ where $\underline{\theta }=(\theta _{1},\dots ,\theta _{m})\in {\mathbb{R}}^{m}$. Suppose that $\underline{\theta }\to f_{\underline{\theta }}$ is ${C}^{1}$. Define a score vector $S(\mathbf{X},\underline{\theta })=\mathbf{1}(f_{\underline{\theta }}(\mathbf{x})>0)(\frac{\partial }{\partial \theta _{i}}\log f_{\underline{\theta }}(\mathbf{x}),i=1,\dots ,m)$. The $m\times m$ weighted Fisher information matrix (WFIM) is defined as
(5.1)
\[{J_{\varphi }^{\mathrm{w}}}(f_{\underline{\theta }})={J_{\varphi }^{\mathrm{w}}}(\mathbf{X},\underline{\theta })=\mathbb{E}\big[\varphi (\mathbf{X})S(\mathbf{X};\underline{\theta }){S}^{T}(\mathbf{X};\underline{\theta })\big].\]Theorem 5.1 (Connection between WFIM and weighted KL-divergence measures).
For smooth families $\{f_{\theta },\theta \in \varTheta \in {\mathbb{R}}^{1}\}$ and a given WF φ, we get
where $D_{\theta }$ stands for $\frac{\partial }{\partial \theta }$.
(5.2)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {D_{\varphi }^{\mathrm{w}}}(f_{\theta _{1}}\| f_{\theta _{2}})& \displaystyle =\frac{1}{2}{J_{\varphi }^{\mathrm{w}}}(X,\theta _{1}){(\theta _{2}-\theta _{1})}^{2}+\mathbb{E}_{\theta _{1}}\big[\varphi (X)D_{\theta }\log f_{\theta _{1}}(X)\big](\theta _{1}-\theta _{2})\\{} & \displaystyle \hspace{1em}-\frac{1}{2}\mathbb{E}_{\theta _{1}}\bigg[\varphi (X)\frac{{D_{\theta }^{2}}f_{\theta _{1}}(X)}{f_{\theta _{1}}(X)}\bigg]{(\theta _{2}-\theta _{1})}^{2}+o\big(|\theta _{1}-\theta _{2}{|}^{2}\big)\mathbb{E}_{\theta _{1}}\big[\varphi (X)\big]\end{array}\]Proof.
By virtue of a Taylor expansion of $\log f_{\theta _{2}}$ around $\theta _{1}$, we obtain
Here $O_{x}(|\theta _{2}-\theta _{1}{|}^{3})$ denotes the reminder term which has a hidden dependence on x. Multiply both sides of (5.3) by φ and take expectations assuming that we can interchange differentiation and expectation appropriately. Next, observe that
Hence
Therefore the claimed result, i.e., (5.2), is achieved. □
(5.3)
\[\log f_{\theta _{2}}=\log f_{\theta _{1}}+D_{\theta }\log f_{\theta _{1}}(\theta _{2}-\theta _{1})+\frac{1}{2}{D_{\theta }^{2}}\log f_{\theta _{1}}{(\theta _{2}-\theta _{1})}^{2}+O_{x}\big(|\underline{\theta }_{2}-\underline{\theta }_{1}{|}^{3}\big).\](5.4)
\[{D_{\theta }^{2}}\log f_{\theta _{1}}=\frac{{D_{\theta }^{2}}f_{\theta _{1}}}{f_{\theta _{1}}}-\frac{{(D_{\theta }f_{\theta _{1}})}^{2}}{{f_{\theta _{1}}^{2}}}.\](5.5)
\[\mathbb{E}_{\theta _{1}}\big[\varphi {D_{\theta }^{2}}\log f_{\theta _{1}}\big]=\mathbb{E}_{\theta _{1}}\bigg[\varphi \frac{{D_{\theta }^{2}}f_{\theta _{1}}}{f_{\theta _{1}}}\bigg]-{J_{\varphi }^{\mathrm{w}}}(f_{\theta _{1}}).\]6 Weighted entropy power inequality
Let $\mathbf{X}_{1},\mathbf{X}_{2}$ be independent RVs with PDFs $f_{1},f_{2}$ and $\mathbf{X}=\mathbf{X}_{1}+\mathbf{X}_{2}$. The famous Shannon entropy power inequality (EPI) states that
where $\mathbf{N}_{1}$, $\mathbf{N}_{2}$ are Gaussian $\mathrm{N}(\mathbf{0},{\sigma }^{2}\mathbf{I}_{d})$ RVs such that $h(\mathbf{X}_{i})=h(\mathbf{N}_{i})$, $i=1,2$. Equivalently,
see, e.g., [1, 7]. The EPI is widely used in electronics, i.e., consider a RV Y which satisfies
where $a_{i}\in {\mathbf{R}}^{1}$, $\{\mathbf{X}_{i}\}$ are IID RVs. Then the EPI means
with equality if and only if either X is Gaussian or if $\mathbf{Y}_{n}=\mathbf{X}_{n-k}$, for some k, that is, the filtering operation is a pure delay. Clearly, a possible extension of the EPI gives more flexibility in signal processing. We are interested in the weighted entropy power inequality (WEPI)
Note that (6.5) coincides with (6.2) when $\varphi \equiv 1$. Let $d=1$, we set
(6.2)
\[{e}^{\frac{2}{d}h(\mathbf{X}_{1}+\mathbf{X}_{2})}\ge {e}^{\frac{2}{d}h(\mathbf{X}_{1})}+{e}^{\frac{2}{d}h(\mathbf{X}_{2})},\](6.3)
\[\mathbf{Y}_{n}=\sum \limits_{i=0}^{\infty }a_{i}\mathbf{X}_{n-i},n\in {\mathbf{Z}}^{1},\hspace{1em}\sum \limits_{i=0}^{\infty }|a_{i}{|}^{2}<\infty ,\](6.4)
\[h(\mathbf{Y})\ge h(\mathbf{X})+\frac{1}{2}\log \Bigg(\sum \limits_{i=0}^{\infty }|a_{i}{|}^{2}\Bigg),\](6.5)
\[\kappa :=\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(\mathbf{X}_{1})}{d\mathbb{E}\varphi (\mathbf{X}_{1})}\bigg)+\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(\mathbf{X}_{2})}{d\mathbb{E}\varphi (\mathbf{X}_{2})}\bigg)\le \exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(\mathbf{X})}{d\mathbb{E}\varphi (\mathbf{X})}\bigg).\]Theorem 6.1.
Given independent RVs $X_{1},X_{2}\in {\mathbb{R}}^{1}$ with PDFs $f_{1},f_{2}$, and the weight function φ, set $X=X_{1}+X_{2}$. Assume the following conditions:
(i)
(ii) With $Y_{1},Y_{2}$ and α as defined in (6.6),
where $\varphi _{c}(x)=\varphi (x\cos \alpha )$, $\varphi _{s}(x)=\varphi (x\sin \alpha )$ and
Then the WEPI holds.
(6.7)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbb{E}\varphi (X_{i})& \displaystyle \ge \mathbb{E}\varphi (X)\hspace{1em}\textit{if}\hspace{2.5pt}\kappa \ge 1,\hspace{2.5pt}i=1,2,\\{} \displaystyle \mathbb{E}\varphi (X_{i})& \displaystyle \le \mathbb{E}\varphi (X)\hspace{1em}\textit{if}\hspace{2.5pt}\kappa \le 1,\hspace{2.5pt}i=1,2.\end{array}\](6.8)
\[{(\cos \alpha )}^{2}{h_{\varphi _{c}}^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\varphi _{s}}^{\mathrm{w}}}(Y_{2})\le {h_{\varphi }^{\mathrm{w}}}(X),\]Paying homage to [13] we call (6.8) weighted Lieb’s splitting inequality (WLSI). In some cases the WLSI may be effectively checked.
Proof.
Note that
Using (6.8), we have the following inequality
Furthermore, recalling the definition of κ in (6.5) we obtain
By virtue of assumption (6.7), we derive
The definition of κ in (6.5) leads directly to the result. □
(6.10)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {h_{\varphi }^{\mathrm{w}}}(X_{1})& \displaystyle ={h_{\varphi _{c}}^{\mathrm{w}}}(Y_{1})+\mathbb{E}\varphi (X_{1})\log \cos \alpha ,\\{} \displaystyle {h_{\varphi }^{\mathrm{w}}}(X_{2})& \displaystyle ={h_{\varphi _{s}}^{\mathrm{w}}}(Y_{2})+\mathbb{E}\varphi (X_{2})\log \sin \alpha .\end{array}\](6.11)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {h_{\varphi }^{\mathrm{w}}}(X)& \displaystyle \ge {(\cos \alpha )}^{2}\big[{h_{\varphi }^{\mathrm{w}}}(X_{1})-\mathbb{E}\varphi (X_{1})\log \cos \alpha \big]\\{} & \displaystyle \hspace{1em}+{(\sin \alpha )}^{2}\big[{h_{\varphi }^{\mathrm{w}}}(X_{2})-\mathbb{E}\varphi (X_{2})\log \sin \alpha \big].\end{array}\](6.12)
\[{h_{\varphi }^{\mathrm{w}}}(X)\ge \frac{1}{2\kappa }\big[\mathbb{E}\varphi (X_{1})\log \kappa \big]\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(X_{1})}{\mathbb{E}\varphi (X_{1})}\bigg)+\frac{1}{2\kappa }\big[\mathbb{E}\varphi (X_{1})\log \kappa \big]\exp \bigg(\frac{2{h_{\varphi }^{\mathrm{w}}}(X_{2})}{\mathbb{E}\varphi (X_{2})}\bigg).\]Example 6.2.
Let $d=1$ and $X_{1}\sim \mathrm{N}(0,{\sigma _{1}^{2}})$, $X_{2}\sim \mathrm{N}(0,{\sigma _{2}^{2}})$. Then the WLSI (6.8) takes the following form
(6.14)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]\mathbb{E}\varphi (X)+\frac{\log e}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\mathbb{E}\big[{X}^{2}\varphi (X)\big]\\{} & \displaystyle \hspace{1em}\ge {(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\mathbb{E}\varphi (X_{1})+\frac{{(\cos \alpha )}^{2}\log e}{{\sigma _{1}^{2}}}\mathbb{E}\big[{X_{1}^{2}}\varphi (X_{1})\big]\\{} & \displaystyle \hspace{2em}+{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg]\mathbb{E}\varphi (X_{2})+\frac{{(\sin \alpha )}^{2}\log e}{{\sigma _{2}^{2}}}\mathbb{E}\big[{X_{2}^{2}}\varphi (X_{2})\big].\end{array}\]Example 6.3.
Let $d=1$, $X=X_{1}+X_{2}$, $X_{1}\sim \mathrm{U}[a_{1},b_{1}]$ and $X_{2}\sim \mathrm{U}[a_{2},b_{2}]$ be independent. Denote by $\varPhi (x)={\int _{0}^{x}}\varphi (u)\mathrm{d}u$ and $L_{i}=b_{i}-a_{i}$, $i=1,2$. The WDE ${h_{\varphi }^{\mathrm{w}}}(X_{i})=\frac{\varPhi (b_{i})-\varPhi (a_{i})}{L_{i}}\log L_{i}$. Then the inequality $\kappa \ge (\le )1$ takes the form ${L_{1}^{2}}+{L_{2}^{2}}\ge (\le )1$. Suppose for definiteness that $L_{2}\ge L_{1}$ or, equivalently, $C_{1}:=a_{2}+b_{1}\le a_{1}+b_{2}=:C_{2}$. Inequalities (6.7) take the form
The WLSI takes the form
where
Finally, define ${\varPhi }^{\ast }(x)={\int _{0}^{x}}u\varphi (u)\mathrm{d}u$ and note that
(6.15)
\[L_{2}\big[\varPhi (b_{1})-\varPhi (a_{1})\big],\hspace{2em}L_{1}\big[\varPhi (b_{2})-\varPhi (a_{2})\big]\ge (\le )\mathbb{E}\varphi (X).\](6.16)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle -\varLambda +\log (L_{1}L_{2})\mathbb{E}\varphi (X)& \displaystyle \ge {(\cos \alpha )}^{2}\frac{\varPhi (b_{1})-\varPhi (a_{1})}{L_{1}}\log \bigg(\frac{L_{1}}{\cos \alpha }\bigg)\\{} & \displaystyle \hspace{1em}+{(\sin \alpha )}^{2}\frac{\varPhi (b_{2})-\varPhi (a_{2})}{L_{2}}\log \bigg(\frac{L_{2}}{\sin \alpha }\bigg),\end{array}\](6.17)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varLambda & \displaystyle =\frac{\log L_{1}}{L_{2}}\big[\varPhi (C_{1})-\varPhi (C_{2})\big]+\frac{1}{L_{1}L_{2}}\Bigg[{\int _{A}^{C_{1}}}\varphi (x)(x-A)\log (x-A)\mathrm{d}x\\{} & \displaystyle \hspace{1em}+{\int _{C_{2}}^{B}}\varphi (x)(B-x)\log (B-x)\mathrm{d}x\Bigg],\\{} \displaystyle A& \displaystyle =a_{1}+a_{2},\hspace{2.5pt}B=b_{1}+b_{2}.\end{array}\](6.18)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbb{E}\varphi (X)& \displaystyle =\frac{1}{L_{1}L_{2}}\big[{\varPhi }^{\ast }(C_{1})-{\varPhi }^{\ast }(A)-{\varPhi }^{\ast }(B)+{\varPhi }^{\ast }(C_{2})\big]\\{} & \displaystyle \hspace{1em}-A\big[\varPhi (C_{1})-\varPhi (A)\big]+L_{1}\big[\varPhi (C_{2})-\varPhi (C_{1})\big]+B\big[\varPhi (B)-\varPhi (C_{2})\big].\end{array}\]7 The WLSI for the WF close to a constant
Proposition 7.1.
Let $d=1$, $X_{i}\sim \mathrm{N}(\mu _{i},{\sigma _{i}^{2}}),i=1,2$ be independent and $X=X_{1}+X_{2}\sim \mathrm{N}(\mu _{1}+\mu _{2},{\sigma _{1}^{2}}+{\sigma _{2}^{2}})$. Suppose that WF $x\to \varphi (x)$ is twice continuously differentiable and
where $\epsilon >0$ and $\bar{\varphi }>0$ are constants. Then there exists $\epsilon _{0}>0$ such that for any WF φ satisfying (7.1) with $0<\epsilon <\epsilon _{0}$ WLSI holds true. Hence, checking of the WEPI is reduced to condition (6.7).
(7.1)
\[\big|{\varphi ^{\prime\prime }}(x)\big|\le \epsilon \varphi (x),\hspace{2em}\big|\varphi (x)-\bar{\varphi }\big|\le \epsilon ,\]For a RV Z, $\gamma >0$ and independent Gaussian RV $\mathbf{N}\sim \mathrm{N}(\mathbf{0},\mathbf{I}_{d})$ define
where $\| .\| $ stands for the Euclidean norm. According to [24, 8] the differential entropy
For $\mathbf{Z}=\mathbf{Y}_{1},\mathbf{Y}_{2},\mathbf{X}_{1}+\mathbf{X}_{2}$ assume the following conditions
and the uniform integrability: for independent $\mathbf{N},{\mathbf{N}^{\prime }}\sim \mathrm{N}(\mathbf{0},\mathbf{I})$ and any $\gamma >0$ there exist an integrable function $\xi (\mathbf{Z},\mathbf{N})$ such that
(7.2)
\[M(\mathbf{Z};\gamma )=\mathbb{E}\big[{\big\| \mathbf{Z}-\mathbb{E}[\mathbf{Z}|\mathbf{Z}\sqrt{\gamma }+\mathbf{N}]\big\| }^{2}\big],\](7.3)
\[h(\mathbf{Z})=h(\mathbf{N})+\frac{1}{2}{\int _{0}^{\infty }}\big[M(\mathbf{Z};\gamma )-\mathbf{1}_{\{\gamma <1\}}\big]\mathrm{d}\gamma .\](7.4)
\[\mathbb{E}\big[\big|\log f_{\mathbf{Z}}(\mathbf{Z})\big|\big]<\infty ,\mathbb{E}\big[\| \mathbf{Z}{\| }^{2}\big]<\infty \]Theorem 7.2.
Let $d=1$ and assume conditions (7.4), (7.5). Let $\gamma _{0}$ be a point of continuity of $M(Z;\gamma ),Z=Y_{1},Y_{2},X_{1}+X_{2}$. Suppose that there exists $\delta >0$ such that
Suppose that for some $\bar{\varphi }>0$ the WF satisfies
Then there exists $\epsilon _{0}=\epsilon _{0}(\gamma _{0},\delta ,f_{1},f_{2})$ such that for any WF satisfying (7.7) with $\epsilon <\epsilon _{0}$ the WLSI holds true.
(7.6)
\[M(X_{1}+X_{2};\gamma _{0})\ge M(Y_{1},\gamma _{0}){(\cos \alpha )}^{2}+M(Y_{2};\gamma _{0}){(\sin \alpha )}^{2}+\delta .\]Proof.
For a constant WF $\bar{\varphi }$, the following inequality is valid (see [8], Lemma 4.2 or [24], Eqns (9) and (10))
However, in view of Theorem 4.1 from [8], the representation (7.3) and inequality (7.6) imply under conditions (7.4) and (7.5) a stronger inequality
Here $c_{0}>0$ and the term of order δ appears from integration in (7.3) in a neighbourhood of the continuity point $\gamma _{0}$. Define ${\varphi }^{\ast }(x)=|\varphi (x)-\bar{\varphi }|$. It is easy to check that
From (7.9) and (7.10) we obtain that for ϵ small enough
i.e., the WLSI holds true. □
(7.8)
\[{(\cos \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{2})\le {h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1}\cos \alpha +Y_{2}\sin \alpha ).\](7.9)
\[{(\cos \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1})+{(\sin \alpha )}^{2}{h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{2})+c_{0}\delta \le {h_{\bar{\varphi }}^{\mathrm{w}}}(Y_{1}\cos \alpha +Y_{2}\sin \alpha ).\](7.10)
\[{h_{{\varphi }^{\ast }}^{\mathrm{w}}}(Z)<c_{1}\epsilon ,\hspace{1em}Z=X_{1},X_{2},X_{1}+X_{2}.\]As an example, consider the case where RVs $X_{1},X_{2}$ are normal and WF $\varphi \in {C}^{2}$.
Proposition 7.3.
Let RVs $X_{i}\sim \mathrm{N}(\mu _{i},{\sigma _{i}^{2}}),i=1,2$ be independent, and $X=X_{1}+X_{2}\sim \mathrm{N}(\mu _{1}+\mu _{2},{\sigma _{1}^{2}}+{\sigma _{2}^{2}})$. Suppose that WF $x\in \mathbb{R}\to \varphi (x)\ge 0$ is twice contiuously differentiable and slowly varying in the sense that $\forall x$,
where $\epsilon >0$ and $\bar{\varphi }>0$ are constants. Then there exists $\epsilon _{0}=\epsilon _{0}(\mu _{0},\mu _{1},{\sigma _{0}^{2}},{\sigma _{2}^{2}})>0$ such that for any $0<\epsilon \le \epsilon _{0}$, the WLSI (6.8) with the WF φ holds true.
(7.12)
\[\big|{\varphi ^{\prime\prime }}(x)\big|\le \epsilon \varphi (x),\hspace{2em}\big|\varphi (x)-\bar{\varphi }\big|<\epsilon ,\]Proof.
Let α be as in (6.6); to check (6.8), we use Stein’s formula: for $Z\sim \mathrm{N}(0,{\sigma }^{2})$
Owing the inequality $|\varphi (x)-\bar{\varphi }|<\epsilon $ we have
Here
and
Evidently, under conditions $|{\varphi ^{\prime }}(x)|,|{\varphi ^{\prime\prime }}(x)|<\epsilon \varphi (x)$ we have that $\alpha _{0}<\frac{\pi }{2}-\epsilon $ and $0<\epsilon <{(\sin \alpha )}^{2}$, ${(\cos \alpha )}^{2}<1-\epsilon <1$. We claim that inequality (6.14) is satisfied with φ replaced by $\bar{\varphi }$ and added $\delta >0$:
Here $\delta >0$ is calculated through ϵ and increases to a limit $\delta _{0}>0$ as $\epsilon \downarrow 0$. Indeed, strict concavity of $\log y$ for $y\in [0,\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\vee \frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}]$ implies that
On the other hand,
Combining (7.18) and (7.19) one gets (7.17). Now, to check (6.8) with WF φ, in view of (7.17) it suffices to verify
We check (7.20) by a brute force, claiming that each term in (7.20) has the absolute value $<\delta /6$ when ϵ is small enough. For the terms containing $\mathbb{E}[\varphi (Z)-\bar{\varphi }]$, $Z=X,X_{1}X_{2}$, this follows since $|\varphi -\bar{\varphi }|<\epsilon $. For the terms containing factor $\mathbb{E}[{Z}^{2}(\varphi (Z)-\bar{\varphi })]$, we use Stein’s formula (7.13) and the condition that $|{\varphi ^{\prime\prime }}(x)|\le \epsilon \varphi (x)$. □
(7.13)
\[\mathbb{E}\big[{Z}^{2}\varphi (Z)\big]={\sigma }^{2}\mathbb{E}\big[\varphi (Z)\big]+{\sigma }^{4}\mathbb{E}\big[{\varphi ^{\prime\prime }}(Z)\big].\](7.14)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \alpha <\alpha _{0}& \displaystyle ={\tan }^{-1}\big(\exp \big({(\bar{\varphi }+\epsilon )}^{2}\big[h_{+}(X_{2})-h_{-}(X_{1})\big]\\{} & \displaystyle \hspace{1em}-{(\bar{\varphi }-\epsilon )}^{2}\big[h_{+}(X_{1})-h_{-}(X_{2})\big]\big)\big).\end{array}\](7.15)
\[h_{\pm }(X_{i})=-\mathbb{E}\big[\mathbf{1}\big(X_{i}\in {A_{\pm }^{i}}\big)\log {f_{X_{i}}^{No}}(X_{i})\big],\hspace{1em}i=1,2.\](7.16)
\[{A_{+}^{i}}=\big\{x\in \mathbf{R}:{f_{i}^{No}}(x)<1\big\},\hspace{2em}{A_{-}^{i}}=\big\{x\in \mathbf{R}:{f_{i}^{No}}(x)>1\big\},\hspace{1em}i=1,2.\](7.17)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]\bar{\varphi }+\frac{\log e}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\mathbb{E}\big[{X}^{2}\big]\bar{\varphi }\\{} & \displaystyle \hspace{1em}\ge {(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\bar{\varphi }+\frac{{(\cos \alpha )}^{2}\log e}{{\sigma _{1}^{2}}}\mathbb{E}\big[{X_{1}^{2}}\big]\bar{\varphi }\\{} & \displaystyle \hspace{2em}+{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg]\bar{\varphi }+\frac{{(\sin \alpha )}^{2}\log e}{{\sigma _{2}^{2}}}\mathbb{E}\big[{X_{2}^{2}}\big]\bar{\varphi }+\delta .\end{array}\](7.18)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]& \displaystyle \ge {(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\\{} & \displaystyle \hspace{1em}+{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg]+\delta .\end{array}\](7.19)
\[\frac{1}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\bar{\varphi }\mathbb{E}\big[{X}^{2}\big]=\frac{{(\cos \alpha )}^{2}}{{\sigma _{1}^{2}}}\bar{\varphi }\mathbb{E}\big[{X_{1}^{2}}\big]+\frac{{(\sin \alpha )}^{2}}{{\sigma _{2}^{2}}}\bar{\varphi }\mathbb{E}\big[{X_{2}^{2}}\big].\](7.20)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \log \big[2\pi \big({\sigma _{1}^{2}}+{\sigma _{2}^{2}}\big)\big]\big[\mathbb{E}\varphi (X)-\bar{\varphi }\big]+\frac{\log e}{{\sigma _{1}^{2}}+{\sigma _{2}^{2}}}\mathbb{E}\big[{X}^{2}\big(\varphi (X)-\bar{\varphi }\big)\big]\\{} & \displaystyle \hspace{1em}-{(\cos \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{1}^{2}}}{{(\cos \alpha )}^{2}}\bigg)\bigg]\big[\mathbb{E}\varphi (X_{1})-\bar{\varphi }\big]+\frac{{(\cos \alpha )}^{2}\log e}{{\sigma _{1}^{2}}}\mathbb{E}\big[{X_{1}^{2}}\big(\varphi (X_{1})-\bar{\varphi }\big)\big]\\{} & \displaystyle \hspace{1em}-{(\sin \alpha )}^{2}\bigg[\log \bigg(\frac{2\pi {\sigma _{2}^{2}}}{{(\sin \alpha )}^{2}}\bigg)\bigg][\mathbb{E}\varphi (X_{2})-\bar{\varphi })\hspace{0.1667em}+\frac{{(\sin \alpha )}^{2}\log e}{{\sigma _{2}^{2}}}\mathbb{E}\big[{X_{2}^{2}}\big(\varphi (X_{2})\hspace{0.1667em}-\hspace{0.1667em}\bar{\varphi }\big)\big]<\hspace{0.1667em}\delta .\end{array}\]Similar assertions can be established for other examples of PDFs $f_{1}(x)$ and $f_{2}(x)$, i.e. uniform, exponential, Gamma, Cauchy, etc.
8 A weighted Fisher information inequality
Let $\mathbf{Z}=(\mathbf{X},\mathbf{Y})$ be a pair of independent RVs X and $\mathbf{Y}\in {\mathbb{R}}^{d}$, with sample values $\mathbf{z}=(\mathbf{x},\mathbf{y})\in {\mathbb{R}}^{d}\times {\mathbb{R}}^{d}$ and marginal PDFs $f_{1}(\mathbf{x},\underline{\theta }),f_{2}(\mathbf{y},\underline{\theta })$, respectively. Let $f_{\mathbf{Z}|\mathbf{X}+\mathbf{Y}}(\mathbf{x},\mathbf{y}|\mathbf{u})$ stand for the conditional PDF as
Given a WF $\mathbf{z}=(\mathbf{x},\mathbf{y})\in {\mathbb{R}}^{d}\times {\mathbb{R}}^{d}\mapsto \varphi (\mathbf{z})\ge 0$, we employ the following reduced WFs:
Next, let us introduce the matrices $M_{\varphi }$ and $G_{\varphi }$:
(8.1)
\[f_{\mathbf{Z}|\mathbf{X}+\mathbf{Y}}(\mathbf{x},\mathbf{y}|\mathbf{u})=\frac{f_{1}(\mathbf{x})f_{2}(\mathbf{y})\mathbf{1}(\mathbf{x}+\mathbf{y}=\mathbf{u})}{\int _{{\mathbb{R}}^{n}}f_{1}(\mathbf{v})f_{2}(\mathbf{u}-\mathbf{v})\mathrm{d}\mathbf{v}}.\](8.2)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varphi (\mathbf{u})& \displaystyle =\int \varphi (\mathbf{v},\mathbf{u}-\mathbf{v})f_{\mathbf{Z}|\mathbf{X}+\mathbf{Y}}(\mathbf{v},\mathbf{u}-\mathbf{v})\mathrm{d}\mathbf{v},\\{} \displaystyle \varphi _{1}(\mathbf{x})& \displaystyle =\int \varphi (\mathbf{x}+\mathbf{y},\mathbf{y})f_{2}(\mathbf{y})\mathrm{d}\mathbf{y},\varphi _{2}(\mathbf{y})=\int \varphi (\mathbf{x},\mathbf{x}+\mathbf{y})f_{1}(\mathbf{x})\mathrm{d}\mathbf{x}.\end{array}\](8.3)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle M_{\varphi }& \displaystyle =\int \varphi (\mathbf{x},\mathbf{y})f_{1}(\mathbf{x})f_{2}(\mathbf{y}){\bigg(\frac{\partial \log f_{1}(\mathbf{x})}{\partial \underline{\theta }}\bigg)}^{\mathrm{T}}\bigg(\frac{\partial \log f_{2}(\mathbf{x})}{\partial \underline{\theta }}\bigg)\mathbf{1}\big(f_{1}(\mathbf{x})f_{2}(\mathbf{y})>0\big)\mathrm{d}\mathbf{x}\mathrm{d}\mathbf{y},\\{} \displaystyle G_{\varphi }& \displaystyle ={\big({J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }{\big({J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}.\end{array}\]Note that for $\varphi \equiv 1$ we have $M_{\varphi }=G_{\varphi }=0$ and the classical Fisher information inequality emerges (cf. [27]). Finally, we define
(8.4)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varXi & \displaystyle :=\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})=M_{\varphi }{J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})G_{\varphi }{(\mathbf{I}-M_{\varphi }G_{\varphi })}^{-1}M_{\varphi }\big[G_{\varphi }{J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})G_{\varphi }-{J_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big]\\{} & \displaystyle \hspace{1em}+G_{\varphi }{(\mathbf{I}-M_{\varphi }G_{\varphi })}^{-1}M_{\varphi }G_{\varphi }{J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big[{M_{\varphi }^{-1}}-G_{\varphi }\big]-G_{\varphi }{J_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})G_{\varphi }-G_{\varphi }.\end{array}\]Theorem 8.1 (A weighted Fisher information inequality (WFII)).
Let X and Y be independent RVs. Assume that ${f_{\mathbf{X}}^{(1)}}=\frac{\partial }{\partial \underline{\theta }}f_{1}$ is not a multiple of ${f_{\mathbf{Y}}^{(1)}}=\frac{\partial }{\partial \underline{\theta }}f_{2}$. Then
Proof.
We use the same methodology as in Theorem 1 from [27]. Recalling Corollary 4.8, (iii) in [20] substitute $\mathtt{P}:=[1,1]$. Therefore for $\mathbf{Z}=(\mathbf{X},\mathbf{Y})$, ${J}^{\mathrm{w}}(\mathbf{Z})$ is an $m\times m$ matrix
Next, we need the following well-known expression for the inverse of a block matrix
where
with equality iff ${f_{\mathbf{X}}^{(1)}}(\mathbf{x})=\frac{\partial \log f_{1}(\mathbf{x})}{\partial \underline{\theta }}\propto \frac{\partial \log f_{2}(\mathbf{y})}{\partial \underline{\theta }}={f_{\mathbf{Y}}^{(1)}}(\mathbf{y})$.
(8.6)
\[{\big({\mathtt{J}_{\theta }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}={\left(\begin{array}{c@{\hskip10.0pt}c}{\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})& M_{\varphi }\\{} M_{\varphi }& {\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\end{array}\right)}^{-1}.\](8.7)
\[{\left(\begin{array}{c@{\hskip10.0pt}c}\mathbf{C}_{11}& \mathbf{C}_{21}\\{} \mathbf{C}_{12}& \mathbf{C}_{22}\end{array}\right)}^{-1}=\left(\begin{array}{c@{\hskip10.0pt}c}{\mathbf{C}_{11}^{-1}}+\mathbf{D}_{12}{\mathbf{D}_{22}^{-1}}\mathbf{D}_{21}& -\mathbf{D}_{12}{\mathbf{D}_{22}^{-1}}\\{} -{\mathbf{D}_{22}^{-1}}\mathbf{D}_{21}& {\mathbf{D}_{22}^{-1}}\end{array}\right),\]
\[\begin{array}{r@{\hskip10.0pt}c}& \displaystyle \mathbf{D}_{22}=\mathbf{C}_{22}-\mathbf{C}_{21}{\mathbf{C}_{11}^{-1}}\mathbf{C}_{12},\hspace{2em}\mathbf{D}_{12}={\mathbf{C}_{11}^{-1}}\mathbf{C}_{12},\hspace{2em}\mathbf{D}_{21}=\mathbf{C}_{21}{\mathbf{C}_{11}^{-1}},\\{} & \displaystyle \mathbf{C}_{11}={\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X}),\hspace{2em}\mathbf{C}_{22}={\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y}),\hspace{1em}\text{and}\hspace{1em}\mathbf{C}_{12}=\mathbf{C}_{21}=M_{\varphi }.\end{array}\]
By using the Schwarz inequality, we derive
(8.8)
\[{M_{\varphi }^{2}}\le {\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\hspace{0.2778em}{\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y}),\hspace{1em}\text{or }M_{\varphi }\hspace{0.2778em}G_{\varphi }\le \mathbf{I},\]Define
Thus, owing to the (8.7), particularly for $\mathtt{P}=[1,1]$, we can write
Substituting (8.9), in above expression, we have
Consequently by simplifying (8.11), one yields
By using Corollary 3.4, (iii) from [20], we obtain the property claimed in (8.5):
(8.9)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \delta & \displaystyle :={\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})-\tilde{M}_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }=(\mathtt{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi }){\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\\{} & \displaystyle \Rightarrow {\delta }^{-1}={\big({\mathtt{J}_{\theta _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathtt{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}.\end{array}\](8.10)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathtt{P}{\big({\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}{\mathtt{P}}^{T}& \displaystyle ={\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }\hspace{0.2778em}{\delta }^{-1}\hspace{0.2778em}M_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}\\{} & \displaystyle \hspace{1em}-{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}M_{\varphi }\hspace{0.2778em}{\delta }^{-1}-{\delta }^{-1}\hspace{0.2778em}M_{\varphi }\hspace{0.2778em}{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\delta }^{-1}.\end{array}\](8.11)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathtt{P}{\big({\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}{\mathtt{P}}^{T}\\{} & \displaystyle \hspace{1em}={\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+G_{\varphi }{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}M_{\varphi }\hspace{0.2778em}{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}\\{} & \displaystyle \hspace{2em}-G_{\varphi }{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}-{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}M_{\varphi }\hspace{0.2778em}{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}\\{} & \displaystyle \hspace{2em}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}\\{} & \displaystyle \hspace{1em}=\big\{{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })\\{} & \displaystyle \hspace{2em}+G_{\varphi }{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}M_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })-G_{\varphi }\\{} & \displaystyle \hspace{2em}-{\big({\mathtt{J}_{\theta _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}\tilde{M}_{\varphi }{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}(\mathtt{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })\\{} & \displaystyle \hspace{2em}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}\big\}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}.\end{array}\](8.12)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \mathtt{P}{\big({\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\big)}^{-1}{\mathtt{P}}^{T}\\{} & \displaystyle \hspace{1em}=\big\{{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}+\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})\big\}{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}.\end{array}\]
\[{\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{X}+\mathbf{Y})\le {\big\{\big[{\big({\mathtt{J}_{\varphi _{1}}^{\mathrm{w}}}(\mathbf{X})\big)}^{-1}+{\big({\mathtt{J}_{\varphi _{2}}^{\mathrm{w}}}(\mathbf{Y})\big)}^{-1}+\varXi _{\varphi _{1},\varphi _{2}}(\mathbf{X},\mathbf{Y})\big]{(\mathbf{I}-M_{\varphi }\hspace{0.2778em}G_{\varphi })}^{-1}\big\}}^{-1}.\]
This concludes the proof. □Proposition 8.2.
Consider additive RV $\mathbf{Z}=\mathbf{X}+\mathbf{N}_{\varSigma }$, such that $\mathbf{N}_{\varSigma }\sim \mathrm{N}(\mathbf{0},\varSigma )$ and $\mathbf{N}_{\varSigma }$ is independent of X. Introduce matrices
(8.13)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle V_{\varphi }(\mathbf{X}|\mathbf{Z})& \displaystyle =\mathbb{E}\big[\varphi \hspace{0.2778em}{\big(\mathbf{X}-\mathbb{E}[\mathbf{X}|\mathbf{Z}]\big)}^{\mathrm{T}}\big(\mathbf{X}-\mathbb{E}(\mathbf{X}|\mathbf{Z})\big)\big],\\{} \displaystyle E_{\varphi }& \displaystyle =\mathbb{E}\big[\varphi \hspace{0.2778em}{\big(\mathbf{Z}-\mathbb{E}[\mathbf{X}|\mathbf{Z}]\big)}^{\mathrm{T}}\big(\mathbf{X}-\mathbb{E}[\mathbf{X}|\mathbf{Z}]\big)\big],\hspace{1em}\overline{E}_{\varphi }=E_{\varphi }+{E_{\varphi }^{\mathrm{T}}}.\end{array}\]
The WFIM of RV Z can be written as
9 The weighted entropy power is a concave function
Let $\mathbf{Z}=\mathbf{X}+\mathbf{Y}$ and $\mathbf{Y}\sim \mathrm{N}(\mathbf{0},\sqrt{\gamma }\mathbf{I}_{d})$. In the literature, several elegant proofs, employing the Fisher information inequality or basic properties of mutual information, have been proposed in order to prove that the entropy power (EP) is a concave function of γ [2, 25]. We are interested in the weighted entropy power (WEP) defined as follows:
Compute the second derivative of the WEP
where
In view of (9.2) the concavity of the WEP is equivalent to the inequality
(9.2)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{{\mathrm{d}}^{2}}{\mathrm{d}{\gamma }^{2}}\exp \bigg\{\frac{2}{d}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg\}& \displaystyle =\exp \bigg\{\frac{2}{d}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg\}\\{} & \displaystyle \hspace{1em}\times \bigg[{\bigg(\frac{2}{d}\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg)}^{2}+\bigg(\frac{2}{d}\frac{{\mathrm{d}}^{2}}{\mathrm{d}{\gamma }^{2}}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg)\bigg]\\{} & \displaystyle =\exp \bigg\{\frac{2}{d}\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}\bigg\}\bigg[{\big(\varLambda (\gamma )\big)}^{2}+\frac{\mathrm{d}}{\mathrm{d}\gamma }\varLambda (\gamma )\bigg],\end{array}\](9.3)
\[\varLambda (\gamma )=\frac{2}{d}\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})]}.\]In the spirit of the WEP, we shall present a new proof of concavity of EP. Regarding this, let us apply the WFII (8.5) to $\varphi \equiv 1$. Then a straightforward computation gives
Theorem 9.1 (A weighted De Bruijn’s identity).
Let $\mathbf{X}\sim f_{\mathbf{X}}$ be a RV in ${\mathbb{R}}^{n}$, with a PDF $f_{\mathbf{X}}\in {C}^{2}$. For a standard Gaussian RV $\mathbf{N}\sim \mathrm{N}(\mathbf{0},\mathbf{I}_{d})$ independent of X, and given $\gamma >0$, define the RV $\mathbf{Z}=\mathbf{X}+\sqrt{\gamma }\mathbf{N}$ with PDF $f_{\mathbf{Z}}$. Let $\mathbf{V}_{r}$ be the d-sphere of radius r centered at the origin and having surface denoted by $\mathbf{S}_{r}$. Assume that for given WF φ and $\forall \gamma \in (0,1)$ the relations
and
are fulfilled. Then
Here
If we assume that $\varphi \equiv 1$, then the equality (9.8) directly implies (9.4). Hence, the standard entropy power is a concave function of γ.
(9.6)
\[\int f_{\mathbf{Z}}(\mathbf{x})\big|\ln f_{\mathbf{Z}}(\mathbf{x})\big|\mathrm{d}\mathbf{x}<\infty ,\hspace{2em}\int \big|\nabla \hspace{0.2778em}f_{\mathbf{Z}}(\mathbf{y})\ln f_{\mathbf{Z}}(\mathbf{y})\big|\mathrm{d}\mathbf{y}<\infty \](9.7)
\[\underset{r\to \infty }{\lim }\int _{\mathbb{S}_{r}}\varphi (\mathbf{y})\log f_{Z}(\mathbf{y})\big(\nabla f_{Z}(\mathbf{y})\big)\mathrm{d}S_{r}=0\](9.8)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})=\frac{1}{2}\hspace{0.2778em}\mathrm{tr}\hspace{0.2778em}{\mathtt{J}_{\varphi }^{\mathrm{w}}}(\mathbf{Z})-\frac{1}{2}\mathbb{E}\bigg[\varphi \hspace{0.2778em}\frac{\Delta f_{Z}(\mathbf{Z})}{f_{Z}(\mathbf{Z})}\bigg]+\frac{\mathcal{R}(\gamma )}{2}.\](9.9)
\[\mathcal{R}(\gamma )=\mathbb{E}\big[\nabla \varphi \hspace{0.2778em}\log f_{\mathbf{Z}}(\mathbf{Z}){\big(\nabla \log f_{\mathbf{Z}}(\mathbf{Z})\big)}^{\mathrm{T}}\big].\]Next, we establish the concavity of the WEP when the WF is close to a constant.
Theorem 9.2.
Assume conditions (9.6) and (9.7) and suppose that $\forall \gamma \in (0,1)$
Then $\exists \delta =\delta (\epsilon )$ such that any WF φ for which $\exists \bar{\varphi }>0$: $|\varphi -\bar{\varphi }|<\delta $, $|\nabla \hspace{0.2778em}\varphi |<\delta $ the WEP (9.1) is a concave function of γ. Under the milder assumption
the WEP is a concave function of γ in a small neighbourhood of $\gamma =0$.
(9.10)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{d}{\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})}\ge 1+\epsilon .\](9.11)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{d}{\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})}\bigg|_{\gamma =0}\ge 1+\epsilon ,\]Proof.
It is sufficient to check that
By a straightforward calculation
These formulas imply
Next,
and using the Stokes formula one can bound this term by δ. Finally, $|\mathcal{R}(\gamma )|\le \delta $ in view of (9.7), which leads to the claimed result. □
(9.12)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\psi (\gamma )\ge 1\hspace{1em}\mathrm{where}\hspace{2.5pt}\psi (\gamma )={\bigg(\frac{2}{d}\frac{\mathrm{d}}{\mathrm{d}\gamma }\frac{{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})}{\mathbb{E}[\varphi (\mathbf{Z})}]\bigg)}^{-1}=\varLambda {(\gamma )}^{-1}.\](9.13)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \psi (\mathbf{Z})& \displaystyle =d{\big(\mathbb{E}\big[\varphi (\mathbf{Z})\big]\big)}^{2}{\bigg[\frac{\mathrm{d}}{\mathrm{d}\gamma }{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\mathbb{E}\big[\varphi (\mathbf{Z})\big]-{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})\frac{\mathrm{d}}{\mathrm{d}\gamma }\mathbb{E}\varphi (\mathbf{Z})\bigg]}^{-1},\\{} \displaystyle \frac{\mathrm{d}}{\mathrm{d}\gamma }{h_{\varphi }^{\mathrm{w}}}(\mathbf{Z})& \displaystyle =\frac{1}{2}\mathrm{tr}{J_{\varphi }^{w}}(\mathbf{Y})-\frac{1}{2}\frac{\mathrm{d}}{\mathrm{d}\gamma }\mathbb{E}\big[\varphi (\mathbf{Y})\big]+\frac{1}{2}\mathcal{R}(\gamma ).\end{array}\](9.14)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \psi (\gamma )=\frac{d}{\mathrm{tr}\hspace{0.2778em}J_{\varphi }(\mathbf{Z})}+o(\delta ).\\{} & \displaystyle \mathrm{as}\hspace{0.2778em}\hspace{0.2778em}1-\delta <\mathbb{E}\big[\varphi (\mathbf{Z})\big]<1+\delta ,\hspace{1em}\big|\mathrm{tr}\hspace{0.2778em}{J_{\varphi }^{\mathrm{w}}}(\mathbf{Z})-\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z})\big|<\delta \hspace{0.2778em}\mathrm{tr}\hspace{0.2778em}J(\mathbf{Z}).\end{array}\](9.15)
\[\frac{\mathrm{d}}{\mathrm{d}\gamma }\mathbb{E}\big[\varphi (\mathbf{Z})\big]=\frac{1}{2}\int \varphi (y)\Delta f_{\mathbf{Z}}(\mathbf{y})\mathrm{d}\mathbf{y}\]10 Rates of weighted entropy and information
This section follows [18]. The concept of a rate of the WE or WDE emerges when we work with outcomes in a context of a discrete-time random process (RP):
Here the WF $\varphi _{n}$ is made dependent on n: two immediate cases are where (a) $\varphi _{n}({\mathbf{x}_{1}^{n}})={\sum _{j=0}^{n}}\psi (x_{j})$ and (b) $\varphi _{n}({\mathbf{x}_{1}^{n}})={\prod _{j=0}^{n}}\psi (x_{j})$ (an additive and multiplicative WF, respectively). Next, ${\mathbf{X}_{0}^{n-1}}=(X_{0},\dots ,X_{n-1})$ is a random string generated by an RP. For simplicity, let us focus on RPs taking values in a finite set $\mathcal{X}$. Symbol $\mathbb{P}$ stands for the probability measure of X, and $\mathbb{E}$ denotes the expectation under $\mathbb{P}$. For an RP with IID values, the joint probability of a sample ${\mathbf{x}_{0}^{n-1}}=(x_{0},\dots ,x_{n-1})$ is $\mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})={\prod _{j=0}^{n-1}}p(x_{j})$, $p(x)=\mathbb{P}(X_{j}=x)$ being the probability of an individual outcome $x\in \mathcal{X}$. In the case of a Markov chain, $\mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})=\lambda (x_{0}){\prod _{j=1}^{n}}p(x_{j-1},x_{j})$. Here $\lambda (x)$ gives an initial distribution and $p(x,y)$ is the transition probability on $\mathcal{X}$; to reflect this fact, we will sometimes use the notation ${h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n},\lambda )$. The quantity
(10.1)
\[{h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=-\mathbb{E}\varphi _{n}\big({\mathbf{X}_{0}^{n-1}}\big)\log \mathbf{p}_{n}\big({\mathbf{X}_{0}^{n-1}}\big):=\mathbb{E}{I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{X}_{0}^{n-1}}\big).\]
\[{I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{x}_{0}^{n-1}}\big):=-\varphi _{n}\big({\mathbf{x}_{0}^{n-1}}\big)\log \mathbf{p}_{n}\big({\mathbf{x}_{0}^{n-1}}\big)\]
is interpreted as a weighted information (WI) contained in/conveyed by outcome ${\mathbf{x}_{0}^{n-1}}$.In the IID case, the WI and WE admit the following representations. Define $S(p)=-\mathbb{E}[\log p(X)]$ and ${H_{\psi }^{\mathrm{w}}}=-\mathbb{E}[\psi (X)\log p(X)]$ to be the SE and the WE, of the one-digit distribution (the capital letter is used to make it distinct from ${h_{\varphi _{n}}^{\mathrm{w}}}$, the multi-time WE).
(A) For an additive WF:
and
(B) For a multiplicative WF:
and
The values $\mathrm{A}_{0}$, $\mathrm{B}_{0}$ and their analogs in a general situation are referred to as primary rates, and $\mathrm{A}_{1}$, $\mathrm{B}_{1}$ as secondary rates.
(10.4)
\[{I_{\varphi _{n}}^{\mathrm{w}}}\big({\mathbf{x}_{0}^{n-1}}\big)=-\prod \limits_{j=0}^{n-1}\psi (x_{j})\sum \limits_{l=0}^{n-1}\log p(x_{l})\](10.5)
\[{h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=n{H_{\psi }^{\mathrm{w}}}(p){\big[\mathbb{E}\varphi (X)\big]}^{n-1}:={\mathrm{B}_{0}^{n-1}}\times n\mathrm{B}_{1}.\]10.A WI and WE rates for asymptotically additive WFs
Here we will deal with a stationary RP $\mathbf{X}=(X_{j},j\in \mathbb{Z})$ and use the above notation $\mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})=\mathbb{P}({\mathbf{X}_{0}^{n-1}}={\mathbf{x}_{0}^{n-1}})$ for the joint probability. We will refer to the limit present in the Shannon–McMillan–Breiman (SMB) theorem (see, e.g., [1, 7]) taking place for an ergodic RP:
Here $\mathbb{P}(y|{\mathbf{x}_{-\infty }^{-1}})$ is the conditional PM/DF for $X_{0}=y$ given ${\mathbf{x}_{-\infty }^{-1}}$, an infinite past realization of X. An assumption upon WFs $\varphi _{n}$ called asymptotic additivity (AA) is that
Eqns (10.6), (10.7) lead to the identification of the primary rate: $\mathrm{A}_{0}=\alpha S$.
(10.6)
\[\underset{n\to \infty }{\lim }\bigg[-\frac{1}{n}\log \mathbf{p}_{n}\big({\mathbf{X}_{0}^{n-1}}\big)\bigg]=-\mathbb{E}\log \mathbb{P}\big(X_{0}|{\mathbf{X}_{-\infty }^{-1}}\big):=S,\hspace{1em}\mathbb{P}\text{-a.s.}\](10.7)
\[\underset{n\to \infty }{\lim }\frac{1}{n}\varphi _{n}\big({\mathbf{X}_{0}^{n-1}}\big)=\alpha ,\hspace{1em}\mathbb{P}\text{-a.s.}\hspace{0.2778em}\text{and/or in}\hspace{2.5pt}\mathrm{L}_{2}(\mathbb{P}).\]Theorem 10.1.
Given an ergodic RP X, consider the WI ${I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{X}_{0}^{n-1}})$ and the WE ${H_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})$ as defined in (10.2), (10.3). Suppose that convergence in (10.7) holds $\mathbb{P}$-a.s. Then:
(I) We have that
Example 10.2.
Clearly, the condition of stationarity cannot be dropped. Indeed, let $\varphi _{n}({x_{0}^{n-1}})=\alpha n$ be an additive WF and X be a (non-stationary) Gaussian process with covariances $C=\{C_{\mathit{ij}},i,j\in {\mathbf{Z}_{+}^{1}}\}$. Let $f_{n}$ be a n-dimensional PDF of the vector $(X_{1},\dots ,X_{n})$. Then
Suppose that the eigenvalues $\lambda _{1}\le \cdots \le \lambda _{j}\le \cdots \le \lambda _{n}$ of $C_{n}$ have the order $\lambda _{j}\approx cj$. Then by Stirling’s formula the second term in (10.10) dominates and the scaling of ${h_{\varphi _{n}}^{w}}(f_{n})$ is ${({n}^{2}\log n)}^{-1}$ instead of ${n}^{-2}$ as $n\to \infty $.
(10.10)
\[{h_{\varphi _{n}}^{w}}(f_{n})=\frac{\alpha n}{2}\big[n\log (2\pi e)+\log \big(\mathrm{det}(C_{n})\big)\big].\]Theorem 10.1. can be considered as an analog of the SMB theorem for the primary WE rate in the case of an AA WF. A specification of the secondary rate $\mathrm{A}_{1}$ is given in Theorem 10.3 for an additive WF. The WE rates for multiplicative WFs are studied in Theorem 10.4 for the case where X is a stationary ergodic Markov chain on $\mathcal{X}$.
Theorem 10.3.
Suppose that $\varphi _{n}({\mathbf{x}_{0}^{n-1}})={\sum _{j=0}^{n-1}}\psi (x_{j})$. Let X be a stationary RP with the property that $\forall \hspace{2.5pt}i\in \mathbb{Z}$ there exists the limit
and the last series converges absolutely. Then $\lim _{n\to \infty }\frac{1}{n}{H_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n})=\mathrm{A}_{1}$.
(10.11)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \underset{n\to \infty }{\lim }\sum \limits_{j\in \mathbb{Z}:\hspace{0.1667em}|j+i|\le n}\mathbb{E}\big[\psi (X_{0})\log {p}^{(n+i+j)}\big(X_{j}|{\mathbf{X}_{-n-i}^{j-1}}\big)\big]\\{} & \displaystyle \hspace{1em}=\sum \limits_{j\in \mathbb{Z}}\mathbb{E}\big[\psi (X_{0})\log p\big(X_{j}|{\mathbf{X}_{-\infty }^{j-1}}\big)\big]:=-\mathrm{A}_{1}\end{array}\]10.B WI and WE rates for asymptotically multiplicative WFs
The WI rate is given in Theorem 10.3. Here we use the condition of asymptotic multiplicativity:
Theorem 10.4.
Given an ergodic RP X with a probability distribution $\mathbb{P}$, consider the WI ${I_{\varphi _{n}}^{\mathrm{w}}}({\mathbf{x}_{0}^{n-1}})=-\varphi _{n}({\mathbf{x}_{0}^{n-1}})\log \mathbf{p}_{n}({\mathbf{x}_{0}^{n-1}})$. Suppose that convergence in (10.12) holds $\mathbb{P}$-a.s. Then the following limit holds true:
Theorem 10.5.
Assume that $\varphi ({\mathbf{x}_{0}^{n-1}})={\prod _{j=0}^{n-1}}\psi (x_{j})$, with $\psi (x)>0$, $x\in \mathcal{X}$. Let X be a stationary Markov chain with transition probabilities $p(x,y)>0$, $x,y\in \mathcal{X}$. Then, for all initial distribution λ,
Here
and $\mu >0$ is the Perron–Frobenius eigenvalue of the matrix $\mathtt{M}=(\psi (x)p(x,y))$ coinciding with the norm of $\mathtt{M}$.
(10.13)
\[\underset{n\to \infty }{\lim }\frac{1}{n}\log {h_{\varphi _{n}}^{\mathrm{w}}}(\mathbf{p}_{n},\lambda )=\mathrm{B}_{0}.\]The secondary rate $\mathrm{B}_{1}$ in this case is identified through the invariant probabilities $\pi (x)$ of the Markov chain and the Perron–Frobenius eigenvectors of matrices $\mathtt{M}$ and ${\mathtt{M}}^{\mathrm{T}}$.
Example 10.6.
Consider a stationary sequence $X_{n+1}=\alpha X_{n}+Z_{n+1},n\ge 0$, where $Z_{n+1}\sim \mathrm{N}(0,{\sigma }^{2})$ are independent, and $X_{0}\sim \mathrm{N}(0,c)$, $c=\frac{1}{1-{\alpha }^{2}}$. Then
Conditions of Theorem 10.5 may be checked under some restrictions on the WF ψ, see [18].