1 Introduction
There is a vast literature on branching processes. Here we cite the monographs [1, 3, 12]; moreover, we also cite the monographs [18] for the multitype case, [10], which focuses on statistical inference, and [13] and [15] for applications in biology.
The simplest example of a branching process is the Galton–Watson process. We consider the case of a population that has a unique individual at the beginning and all the individuals (of all generations) live for a unitary time; moreover, at the end of their lifetimes, every individual of the population (of every generation) produces a random number of new individuals acting independently of all the rest, according to a specific fixed distribution. So, if we consider a sequence of random variables $\{V_{n}:n\ge 0\}$ such that $V_{n}$ is the population size at time n (for all $n\ge 0$), we have $V_{0}=1$ and
where $\{X_{n,i}:n,i\ge 1\}$ is a family of nonnegative integer-valued i.i.d. random variables. In other words, $X_{n,1},\dots ,X_{n,V_{n-1}}$ represent the offspring generated at time n by each of $V_{n-1}$ individuals that live at time $n-1$. We recall some other preliminaries on the Galton–Watson process in Section 2, where, in particular, we consider a slightly different notation to allow the case with a random initial population (instead of the case with a unitary initial population cited before).
In this paper, we present large deviation results. The theory of large deviations is a collection of techniques that gives an asymptotic estimate of small probabilities in an exponential scale (see, e.g., [6] as a reference). We recall some preliminaries in Section 2. The literature on large deviations for branching processes is large. Here we essentially recall some references with results concerning the Galton–Watson process.
In several references, the large-time behavior for the supercritical case is studied, namely the case where the offspring mean μ is strictly larger than one (in such a case, the extinction probability is strictly less than one). Here we recall [2] (see also [4] for the multitype case), [5], where the main object is the study of the tails of $W:=\lim _{n\to \infty }V_{n}/{\mu }^{n}$, [19] with a careful analysis based on harmonic moments of $\{V_{n}:n\ge 0\}$, [20] (and [21]) with some conditional large deviation results based on some local limit theorems, [8] where the central role of some “lower deviation probabilities” is highlighted for the study of the asymptotic behavior of the Lotka–Nagaev estimator $V_{n+1}/V_{n}$ of μ.
Other references study the most likely paths to extinction at some time $n_{0}$ when the initial population k is large. The idea is to consider the representation of a branching process with initial population equal to k as a sum of k i.i.d. replications of the process with a unitary initial population; in this case, Cramér’s theorem for empirical means of i.i.d. random variables (on ${R}^{n_{0}}$) plays a crucial role. A most likely path to extinction in [16] (see also [17]) is a trajectory that minimizes the rate function among the paths that reach the level 0 at time $n_{0}$. A generalization of this concept for the most likely paths to reach a level $b\ge 0$ can be found in [11].
In this paper, we are interested in a different direction. Namely, we are interested in the empirical means of i.i.d. replications of the total progeny of a Galton–Watson process. The total progenies of branching processes are studied in several references: here we cite the old references [7, 14, 22] for a Galton–Watson process, and [9] (see Section 2.2) among the references concerning different branching processes. The total progeny of a Galton–Watson process is an almost surely finite random variable when the extinction occurs almost surely, and therefore the supercritical case will not be considered. Some relationships between the offspring distribution and the total progeny distribution of a Galton–Watson process are well known (see (3) for the probability mass functions and (4) for the probability generating functions).
A new relationship is provided by Proposition 1, where we illustrate how the rate function for the empirical means of total progenies can be expressed in terms of the analogous rate function for the empirical means of a single progeny. This is a quite natural problem to investigate large deviations, and, as we can expect, (4) has an important role in the proof; in fact, the large deviation rate function for empirical means of i.i.d. random variables (provided by Cramér’s theorem recalled below; see Theorem 1) is given by the Legendre transform of the logarithm of the (common) moment generating function of the random variables. Moreover, the relationship provided by Proposition 1 can have interest in information theory because the involved rate functions can be expressed in terms of suitable relative entropies (or Kullback–Leibler divergences); see, for example, [23] for a discussion on the rate function expressions in terms of the relative entropy.
Another result presented in this paper is Proposition 2, that is a version of Proposition 1, where the initial population $V_{0}$ is a random variable with a suitable distribution. Finally, in Propositions 3 and 4, we prove large deviation results for some estimators of the offspring mean μ in terms of i.i.d. replications of the total progeny and of the initial population (we are considering the case where the initial population $V_{0}$ is a random variable as in Proposition 2).
We conclude with the outline of the paper. We start with some preliminaries in Section 2. In Section 3, we prove the results concerning the large deviation rate functions related to Cramér’s theorem. Finally, in Section 4, we prove the large deviation results for the estimators of the offspring mean μ.
2 Preliminaries
We start with some preliminaries on the Galton–Watson process. In the second part, we recall some preliminaries on large deviations.
2.1 Preliminaries on Galton–Watson process
Here we introduce a slightly different notation, and, moreover, we recall some preliminaries in order to define the total progeny of a Galton–Watson process.
We start with some notation concerning the offspring distribution (note that $\mu _{f}$ defined further coincides with μ in the Introduction):
Moreover, we introduce the analogous items for the initial population:
So, from now on, we consider the following slightly different notation:
(in place of $\{V_{n}:n\ge 0\}$ presented before). More precisely:
Remark 1.
Note that $\{{V_{n}^{f,g}}:n\ge 0\}$ here corresponds to $\{V_{n}:n\ge 0\}$ presented in the Introduction if $q_{1}=1$ or, equivalently, if $g=\mathrm{id}$ (i.e. $g(s)=s$ for all s).
If we consider the extinction probability
in particular, ${\nu }^{f,g}=\frac{\mu _{g}}{1-\mu _{f}}$ even if $\mu _{f}=1$, namely
\[{p_{\mathrm{ext}}^{f,g}}:=P\big(\big\{{V_{n}^{f,g}}=0\hspace{2.5pt}\text{for some}\hspace{2.5pt}n\ge 0\big\}\big),\]
then it is known that we have
moreover, if $p_{0}>0$, then we have ${p_{\mathrm{ext}}^{f,\mathrm{id}}}=1$ if $\mu _{f}\le 1$ and ${p_{\mathrm{ext}}^{f,\mathrm{id}}}\in (0,1)$ if $\mu _{f}>1$. More generally, we have
\[{p_{\mathrm{ext}}^{f,g}}:=q_{0}+\sum \limits_{n\ge 1}{\big({p_{\mathrm{ext}}^{f,\mathrm{id}}}\big)}^{n}q_{n}=g\big({p_{\mathrm{ext}}^{f,\mathrm{id}}}\big),\]
and, if $q_{0}<1$ (we obviously have ${p_{\mathrm{ext}}^{f,g}}=1$ if $q_{0}=1$), then we have the following cases:
\[\begin{array}{l@{\hskip10.0pt}l}{p_{\mathrm{ext}}^{f,g}}=g(0)=q_{0}& \hspace{2.5pt}\text{if}\hspace{2.5pt}p_{0}=0;\\{} {p_{\mathrm{ext}}^{f,g}}=g(1)=1& \hspace{2.5pt}\text{if}\hspace{2.5pt}p_{0}>0\hspace{2.5pt}\text{and}\hspace{2.5pt}\mu _{f}\le 1;\\{} {p_{\mathrm{ext}}^{f,g}}\in (q_{0},1)& \hspace{2.5pt}\text{if}\hspace{2.5pt}p_{0}>0\hspace{2.5pt}\text{and}\hspace{2.5pt}\mu _{f}>1.\end{array}\]
Then, if $p_{0}>0$ and $\mu _{f}\le 1$, then the random variable ${Y}^{f,g}$ defined by
\[{Y}^{f,g}:=\sum \limits_{i=0}^{\tau -1}{V_{i}^{f,g}},\hspace{1em}\text{where}\hspace{2.5pt}\tau :=\inf \big\{n\ge 0:{V_{n}^{f,g}}=0\big\},\]
is almost surely finite and provides the total progeny of $\{{V_{n}^{f,g}}:n\ge 0\}$. In view of what follows, we consider the probability generating function
where $\{{\pi _{k}^{f,g}}:k\ge 0\}$ is the probability mass function of the random variable ${Y}^{f,g}$. Moreover, we have the mean value
(2)
\[{\nu }^{f,g}:=\sum \limits_{k\ge 0}k{\pi _{k}^{f,g}},\hspace{1em}\text{and we have}\hspace{1em}{\nu }^{f,g}=\frac{\mu _{g}}{1-\mu _{f}};\]Finally, we recall some well-known connections between total progeny and offspring distributions (see e.g. [7]): for the probability mass functions, we have
where $\{{p_{h}^{\ast n}}:h\ge 0\}$ is the nth power of convolution of $\{p_{h}:h\ge 0\}$; for the probability generating functions, we have
2.2 Preliminaries on large deviations
We start with the concept of large deviation principle (LDP). A sequence of random variables $\{W_{n}:n\ge 1\}$ taking values in a topological space $\mathcal{W}$ satisfies the LDP with rate function $I:\mathcal{W}\to [0,\infty ]$ if I is a lower semicontinuous function,
\[\underset{n\to \infty }{\liminf }\frac{1}{n}\log P(W_{n}\in O)\ge -\underset{w\in O}{\inf }I(w)\hspace{1em}\text{for all open sets}\hspace{2.5pt}O,\]
and
\[\underset{n\to \infty }{\limsup }\frac{1}{n}\log P(W_{n}\in C)\le -\underset{w\in C}{\inf }I(w)\hspace{1em}\text{for all closed sets}\hspace{2.5pt}C.\]
We also recall that a rate function I is said to be good if all its level sets $\{\{w\in \mathcal{W}:I(w)\le \eta \}:\eta \ge 0\}$ are compact.Remark 2.
If $P(W_{n}\in S)=1$ for some closed set S (at least eventually with respect to n), then $I(w)=\infty $ for $w\notin S$; this can be checked by taking the lower bound for the open set $O={S}^{c}$.
In particular, we refer to Cramér’s theorem on ${\mathbb{R}}^{d}$ (see e.g. Theorems 2.2.3 and 2.2.30 in [6] for the cases $d=1$ and $d\ge 2$), and we recall its statement. We remark that, in this paper, we consider the cases $d=1$ (in such a case, the rate function need not to be a good rate function) and $d=2$. Moreover, we use the symbol $\langle \cdot ,\cdot \rangle $ for the inner product in ${\mathbb{R}}^{d}$.
Theorem 1 (Cramér’s theorem).
Let $\{W_{n}:n\ge 1\}$ be a sequence of i.i.d. ${\mathbb{R}}^{d}$-valued random variables, and let $\{\bar{W}_{n}:n\ge 1\}$ be the sequence of empirical means defined by $\bar{W}_{n}:=\frac{1}{n}{\sum _{k=1}^{n}}W_{k}$ (for all $n\ge 1$).
(i) If $d=1$, then $\{\bar{W}_{n}:n\ge 1\}$ satisfies the LDP with rate function I defined by
(ii) If $d\ge 2$ and the origin of ${\mathbb{R}}^{d}$ belongs to the interior of the set $\{\theta \in {\mathbb{R}}^{d}:\log \mathbb{E}[{e}^{\langle \theta ,W_{1}\rangle }]<\infty \}$, then $\{\bar{W}_{n}:n\ge 1\}$ satisfies the LDP with good rate function I defined by
3 Applications of Cramér’s theorem
The aim of this section is to prove Propositions 1 and 2. In view of this, we recall Lemmas 1 and 2, which give two immediate applications of Cramér’s theorem (Theorem 1) with $d=1$; in Lemma 2, we consider the case with a unitary initial population almost surely (thus, as stated Remark 1, the case with $q_{1}=1$ or, equivalently, $g=\mathrm{id}$).
Lemma 1 (Cramér’s theorem for offspring distribution).
Let $\{X_{n}:n\ge 1\}$ be i.i.d. random variables with probability generating function f. Let $\{\bar{X}_{n}:n\ge 1\}$ be the sequence of empirical means defined by $\bar{X}_{n}:=\frac{1}{n}{\sum _{k=1}^{n}}X_{k}$ (for all $n\ge 1$). Then $\{\bar{X}_{n}:n\ge 1\}$ satisfies the LDP with rate function $I_{f}$ defined by $I_{f}(x):=\sup _{\alpha \in \mathbb{R}}\{\alpha x-\log f({e}^{\alpha })\}$.
Lemma 2 (Cramér’s theorem for total progeny distribution with $g=\mathrm{id}$).
Assume that $p_{0}>0$ and $\mu _{f}\le 1$. Let $\{Y_{n}:n\ge 1\}$ be i.i.d. random variables with probability generating function $\mathcal{G}_{f,\mathrm{id}}$. Let $\{\bar{Y}_{n}:n\ge 1\}$ be the sequence of empirical means defined by $\bar{Y}_{n}:=\frac{1}{n}{\sum _{k=1}^{n}}Y_{k}$ (for all $n\ge 1$). Then $\{\bar{Y}_{n}:n\ge 1\}$ satisfies the LDP with rate function $I_{\mathcal{G}_{f,\mathrm{id}}}$ defined by $I_{\mathcal{G}_{f,\mathrm{id}}}(y):=\sup _{\beta \in \mathbb{R}}\{\beta y-\log \mathcal{G}_{f,\mathrm{id}}({e}^{\beta })\}$.
Now we can prove our main results. We start with Proposition 1, which provides an expression for $I_{\mathcal{G}_{f,\mathrm{id}}}$ in terms of $I_{f}$.
Proof.
We remark that
\[I_{f}(x):=\underset{\alpha \in \mathcal{D}(f)}{\sup }\big\{\alpha x-\log f\big({e}^{\alpha }\big)\big\},\]
where $\mathcal{D}(f):=\{\alpha \in \mathbb{R}:f({e}^{\alpha })<\infty \}$, and
\[I_{\mathcal{G}_{f,\mathrm{id}}}(x):=\underset{\beta \in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})}{\sup }\big\{\beta y-\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big\},\]
where $\mathcal{D}(\mathcal{G}_{f,\mathrm{id}}):=\{\beta \in \mathbb{R}:\mathcal{G}_{f,\mathrm{id}}({e}^{\beta })<\infty \}$, by Lemmas 1 and 2, respectively.Moreover, the function $\alpha :\mathcal{D}(\mathcal{G}_{f,\mathrm{id}})\to \mathcal{D}(f)$ defined by
is a bijection. This can be checked noting that $\alpha (\beta )\in \mathcal{D}(f)$ (for all $\beta \in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})$) because $f({e}^{\alpha (\beta )})=f(\mathcal{G}_{f,\mathrm{id}}({e}^{\beta }))=\frac{\mathcal{G}_{f,\mathrm{id}}({e}^{\beta })}{{e}^{\beta }}<\infty $ (here we take into account (4)); moreover, its inverse $\beta :\mathcal{D}(f)\to \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})$ is defined by
(where ${\mathcal{G}_{f,\mathrm{id}}^{-1}}$ is the inverse of $\mathcal{G}_{f,\mathrm{id}}$), and $\beta (\alpha )\in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})$ (for all $\alpha \in \mathcal{D}(f)$) because $\mathcal{G}_{f,\mathrm{id}}({e}^{\beta (\alpha )})={e}^{\alpha }<\infty $.
Thus, we can set $\alpha =\log \mathcal{G}_{f,\mathrm{id}}({e}^{\beta })$ (for $\beta \in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})$) in the expression of $I_{f}(x)$, and we get
\[I_{f}(x)=\underset{\beta \in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})}{\sup }\big\{\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)x-\log f\big(\mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big)\big\}.\]
Then (we take into account (4) in the second equality below)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle I_{f}(x)& \displaystyle =\underset{\beta \in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})}{\sup }\big\{\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)x-\log ({e}^{-\beta }{e}^{\beta }f\big(\mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big)\big\}\\{} & \displaystyle =\underset{\beta \in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})}{\sup }\big\{\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)x+\beta -\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big\}\\{} & \displaystyle =\underset{\beta \in \mathcal{D}(\mathcal{G}_{f,\mathrm{id}})}{\sup }\big\{\beta -(1-x)\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big\},\end{array}\]
and, for $x\in [0,1)$, we get
We conclude by taking $x=\frac{y-1}{y}$ for $y\ge 1$ (thus, $x\in [0,1)$), and we obtain the desired equality with some easy computations. □Now we present Proposition 2, which concerns the LDP for the empirical means of i.i.d. bivariate random variables $\{(Y_{n},Z_{n}):n\ge 1\}$ distributed as $({Y}^{f,g},{V_{0}^{f,g}})$. In particular, we obtain an expression for the rate function $I_{\mathcal{G}_{f,g},g}$ in terms of $I_{f}$ in Lemma 1 and $I_{g}$ defined by
(5)
\[I_{g}(z):=\underset{\gamma \in \mathbb{R}}{\sup }\big\{\gamma z-\log g\big({e}^{\gamma }\big)\big\}.\]Proposition 2.
Let $\{(Y_{n},Z_{n}):n\ge 1\}$ be i.i.d. random variables distributed as $({Y}^{f,g},{V_{0}^{f,g}})$. Assume that $\mathbb{E}[{e}^{\beta {Y}^{f,g}+\gamma {V_{0}^{f,g}}}]$ is finite in a neighborhood of $(\beta ,\gamma )=(0,0)$. Let $\{(\bar{Y}_{n},\bar{Z}_{n}):n\ge 1\}$ be the sequence of empirical means defined by $(\bar{Y}_{n},\bar{Z}_{n}):=(\frac{1}{n}{\sum _{k=1}^{n}}Y_{k},\frac{1}{n}{\sum _{k=1}^{n}}Z_{k})$ (for all $n\ge 1$). Then $\{(\bar{Y}_{n},\bar{Z}_{n}):n\ge 1\}$ satisfies the LDP with good rate function $I_{\mathcal{G}_{f,g},g}$ defined by
Remark 3.
We are assuming (implicitly) that $p_{0}>0$ and $\mu _{f}\le 1$; in fact, since we require that $\mathbb{E}[{e}^{\beta {Y}^{f,g}+\gamma {V_{0}^{f,g}}}]$ is finite in a neighborhood of $(\beta ,\gamma )=(0,0)$, we are assuming that $\mu _{f}<1$ and $\mu _{g}<\infty $.
Proof.
The LDP is a consequence of Cramér’s theorem (Theorem 1) with $d=2$, and the rate function $I_{\mathcal{G}_{f,g},g}$ is defined by
\[I_{\mathcal{G}_{f,g},g}(y,z):=\underset{\beta ,\gamma \in \mathbb{R}}{\sup }\big\{\beta y+\gamma z-\log \mathbb{E}\big[{e}^{\beta {Y}^{f,g}+\gamma {V_{0}^{f,g}}}\big]\big\}.\]
Throughout the proof, we restrict our attention on the pairs $(y,z)$ such that $y\ge z\ge 0$. In fact, almost surely, we have ${Y}^{f,g}\ge {V_{0}^{f,g}}\ge 0$, and therefore $\bar{Y}_{n}\ge \bar{Z}_{n}\ge 0$; thus, by Remark 2 we have $I_{\mathcal{G}_{f,g},g}(y,z)=\infty $ if condition $y\ge z\ge 0$ fails.We remark that $\mathbb{E}[{s}^{{Y}^{f,g}}|{V_{0}^{f,g}}]={(\mathcal{G}_{f,\mathrm{id}}(s))}^{{V_{0}^{f,g}}}$, and therefore
To this end, we take two sequences $\{\beta _{n}:n\ge 1\}$ and $\{\delta _{n}:n\ge 1\}$ such that
\[\mathbb{E}\big[{e}^{\beta {Y}^{f,g}+\gamma {V_{0}^{f,g}}}\big]=\mathbb{E}\big[{e}^{\gamma {V_{0}^{f,g}}}{\big(\mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big)}^{{V_{0}^{f,g}}}\big]=g\big({e}^{\gamma }\mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big);\]
thus,
\[I_{\mathcal{G}_{f,g},g}(y,z)=\underset{\beta ,\gamma \in \mathbb{R}}{\sup }\big\{\beta y+\gamma z-\log g\big({e}^{\gamma +\log \mathcal{G}_{f,\mathrm{id}}({e}^{\beta })}\big)\big\}.\]
Furthermore, the function
\[(\beta ,\gamma )\mapsto \big(\beta ,\gamma +\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big)\]
is a bijection defined on $\mathcal{D}(\mathcal{G}_{f,\mathrm{id}})\times \mathbb{R}$, where
\[\mathcal{D}(\mathcal{G}_{f,\mathrm{id}}):=\big\{\beta \in \mathbb{R}:\mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)<\infty \big\}\]
as in the proof of Proposition 1; then, for $\delta :=\gamma +\log \mathcal{G}_{f,\mathrm{id}}({e}^{\beta })$, we obtain
\[I_{\mathcal{G}_{f,g},g}(y,z)=\underset{\beta ,\delta \in \mathbb{R}}{\sup }\big\{\beta y+\big(\delta -\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big)z-\log g\big({e}^{\delta }\big)\big\}.\]
Thus, we have (note that the last equality holds by Proposition 1)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle I_{\mathcal{G}_{f,g},g}(y,z)& \displaystyle \le \underset{\beta \in \mathbb{R}}{\sup }\big\{\beta y+z\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big\}+\underset{\delta \in \mathbb{R}}{\sup }\big\{\delta z-\log g\big({e}^{\delta }\big)\big\}\\{} & \displaystyle =\left\{\begin{array}{l@{\hskip10.0pt}l}zI_{\mathcal{G}_{f,\mathrm{id}}}(y/z)+I_{g}(z)& \hspace{2.5pt}\text{if}\hspace{2.5pt}y\ge z>0,\\{} I_{g}(0)& \hspace{2.5pt}\text{if}\hspace{2.5pt}y=z=0,\\{} \infty & \hspace{2.5pt}\text{otherwise.}\end{array}\right.\\{} & \displaystyle =\left\{\begin{array}{l@{\hskip10.0pt}l}yI_{f}(\frac{y-z}{y})+I_{g}(z)& \hspace{2.5pt}\text{if}\hspace{2.5pt}y\ge z>0,\\{} I_{g}(0)& \hspace{2.5pt}\text{if}\hspace{2.5pt}y=z=0,\\{} \infty & \hspace{2.5pt}\text{otherwise}.\end{array}\right.\end{array}\]
We conclude by showing the inverse inequality
(6)
\[I_{\mathcal{G}_{f,g},g}(y,z)\ge \underset{\beta \in \mathbb{R}}{\sup }\big\{\beta y+z\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big\}+\underset{\delta \in \mathbb{R}}{\sup }\big\{\delta z-\log g\big({e}^{\delta }\big)\big\}.\]
\[\underset{n\to \infty }{\lim }\beta _{n}y-z\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta _{n}}\big)=\underset{\beta \in \mathbb{R}}{\sup }\big\{\beta y+z\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta }\big)\big\}\]
and
\[\underset{n\to \infty }{\lim }\delta _{n}z-\log g\big({e}^{\delta _{n}}\big)=\underset{\delta \in \mathbb{R}}{\sup }\big\{\delta z-\log g\big({e}^{\delta }\big)\big\}.\]
Then we have
\[I_{\mathcal{G}_{f,g},g}(y,z)\ge \beta _{n}y+\big(\delta _{n}-\log \mathcal{G}_{f,\mathrm{id}}\big({e}^{\beta _{n}}\big)\big)z-\log g\big({e}^{\delta _{n}}\big),\]
and we get (6) letting n go to infinity. □4 Large deviations for estimators of $\mu _{f}$
In this section, we prove two LDPs for two sequences of estimators of the offspring mean $\mu _{f}$. Namely, if $\{(\bar{Y}_{n},\bar{Z}_{n}):n\ge 1\}$ is the sequence in Proposition 2 (see also the precise assumptions in Remark 3; in particular, we have $\mu _{f}<1$), then we consider:
Obviously, these estimators are well defined if the denominators $\bar{Y}_{n}$ are different from zero; then, in order to have well-defined estimators, we always assume that $q_{0}=0$ (where $q_{0}$ is as in (1)), and, noting that, in general, $I_{g}(0)=-\log q_{0}$, we have
Moreover, both sequences converge to $\frac{{\nu }^{f,g}-\mu _{g}}{{\nu }^{f,g}}=\mu _{f}$ as $n\to \infty $ (see ${\nu }^{f,g}$ in (2)), and they coincide when the initial population is deterministic (equal to $\mu _{g}$ almost surely).
The LDPs of these two sequences are proved in Propositions 3 and 4. Moreover, Corollary 1 and Remark 4 concern the comparison between the convergence of the first sequence $\{\frac{\bar{Y}_{n}-\bar{Z}_{n}}{\bar{Y}_{n}}:n\ge 1\}$ and its analogue when the initial population is deterministic (equal to the mean). Propositions 3 and 4 are proved by combining the contraction principle (see e.g. Theorem 4.2.1 in [6]) and Proposition 2 (note that the rate function $I_{\mathcal{G}_{f,g},g}$ in Proposition 2 is good, as it is required to apply the contraction principle). We remark that, in the proofs of Propositions 3 and 4, we take into account that $I_{\mathcal{G}_{f,g},g}(0,0)=\infty $ by Proposition 2 and $I_{g}(0)=\infty $. At the end of this section, we present some remarks on the comparison between the rate functions in Propositions 3 and 4 (Remarks 5 and 6).
We start with the LDP of the first sequence of estimators.
Proposition 3.
Assume the same hypotheses of Proposition 2 and $q_{0}=0$. Let $\{(Y_{n},Z_{n}):n\ge 1\}$ be i.i.d. random variables distributed as $({Y}^{f,g},{V_{0}^{f,g}})$. Let $\{(\bar{Y}_{n},\bar{Z}_{n}):n\ge 1\}$ be the sequence of empirical means defined by $(\bar{Y}_{n},\bar{Z}_{n}):=(\frac{1}{n}{\sum _{k=1}^{n}}Y_{k},\frac{1}{n}{\sum _{k=1}^{n}}Z_{k})$ (for all $n\ge 1$). Then $\{\frac{\bar{Y}_{n}-\bar{Z}_{n}}{\bar{Y}_{n}}:n\ge 1\}$ satisfies the LDP with good rate function $J_{\mathcal{G}_{f,g},g}$ defined by
Proof.
By Proposition 2 and the contraction principle we have the LDP of $\{\frac{\bar{Y}_{n}-\bar{Z}_{n}}{\bar{Y}_{n}}:n\ge 1\}$ with good rate function $J_{\mathcal{G}_{f,g},g}$ defined by
\[J_{\mathcal{G}_{f,g},g}(x):=\inf \bigg\{I_{\mathcal{G}_{f,g},g}(y,z):y\ge z>0,\frac{y-z}{y}=x\bigg\}.\]
The case $x\notin [0,1)$ is trivial because we have the infimum over the empty set. For $x\in [0,1)$, we rewrite this expression as follows (where we take into account the expression of the rate function $I_{\mathcal{G}_{f,g},g}$ in Proposition 2):
\[\begin{array}{r@{\hskip0pt}l}\displaystyle J_{\mathcal{G}_{f,g},g}(x)& \displaystyle =\inf \bigg\{I_{\mathcal{G}_{f,g},g}\bigg(\frac{z}{1-x},z\bigg):z>0\bigg\}\\{} & \displaystyle =\inf \bigg\{\frac{z}{1-x}I_{f}\bigg(\frac{\frac{z}{1-x}-z}{\frac{z}{1-x}}\bigg)+I_{g}(z):z>0\bigg\}\\{} & \displaystyle =\inf \bigg\{\frac{z}{1-x}I_{f}(x)+I_{g}(z):z>0\bigg\}\\{} & \displaystyle =-\sup \bigg\{-z\frac{I_{f}(x)}{1-x}-I_{g}(z):z>0\bigg\};\end{array}\]
thus, since $I_{g}(z)=\infty $ for $z\le 0$, we obtain $J_{\mathcal{G}_{f,g},g}(x)=-\log g({e}^{-\frac{I_{f}(x)}{1-x}})$ by taking into account the definition of $I_{g}$ in (5) and the well-known properties of Legendre transforms (see e.g. Lemma 4.5.8 in [6]; see also Lemma 2.2.5(a) and Exercise 2.2.22 in [6] for the convexity and the lower semicontinuity of $\gamma \mapsto \log g({e}^{\gamma })$). □We have an immediate consequence of this proposition that concerns the case with a deterministic initial population equal to $\mu _{g}$ (almost surely). Namely, if we consider the probability generating function $g_{\diamond }$ defined by $g_{\diamond }(s):={s}^{\mu _{g}}$ (for all s), then we mean the case $g=g_{\diamond }$, and therefore:
-
• ${V_{0}^{f,g_{\diamond }}}=\mu _{g}$ almost surely; thus, $Z_{n}=\mu _{g}$ and $\bar{Z}_{n}=\mu _{g}$ almost surely (for all $n\ge 1$);
-
• $\{{Y_{n}^{f,g_{\diamond }}}:n\ge 1\}$ are i.i.d. random variables distributed as ${Y}^{f,g_{\diamond }}$, that is,
-
• the rate function $J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}$ is
(7)
\[J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}(x)=\left\{\begin{array}{l@{\hskip10.0pt}l}\mu _{g}\cdot \frac{I_{f}(x)}{1-x}& \hspace{2.5pt}\text{if}\hspace{2.5pt}x\in [0,1),\\{} \infty & \hspace{2.5pt}\text{otherwise,}\end{array}\right.\]
Corollary 1 (Comparison between $J_{\mathcal{G}_{f,g},g}$ in Proposition 3 and $J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}$).
We have $J_{\mathcal{G}_{f,g},g}(x)\le J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}(x)$ for all $x\in \mathbb{R}$. Moreover the inequality turns into an equality if and only if we have one of the following cases:
-
• $x\notin [0,1)$ and $J_{\mathcal{G}_{f,g},g}(x)=J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}(x)=\infty $;
-
• $x=\mu _{f}$ and $J_{\mathcal{G}_{f,g},g}(x)=J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}(x)=0$;
-
• ${V_{0}^{f,g}}$ is deterministic, equal to $\mu _{g}$, and $J_{\mathcal{G}_{f,g},g}(x)=J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}(x)$ for all $x\in \mathbb{R}$.
Proof.
The case $x\notin [0,1)$ is trivial. On the contrary, if $x\in [0,1)$, then by Jensen’s inequality we have
moreover, the cases where the inequality turns into an equality follow from the well-known properties of Jensen’s inequality. □
Remark 4 (Comparison between convergence of estimators of $\mu _{f}$).
Assume that $\mu _{f}>0$ and the initial population is not deterministic. Then there exists $\eta >0$ such that
Thus, we can say that $\{\frac{{\bar{Y}_{n}^{f,g_{\diamond }}}-\mu _{g}}{{\bar{Y}_{n}^{f,g_{\diamond }}}}:n\ge 1\}$ converges to $\mu _{f}$ (as $n\to \infty $) faster than $\{\frac{{\bar{Y}_{n}^{f,g}}-\bar{Z}_{n}}{{\bar{Y}_{n}^{f,g}}}:n\ge 1\}$; in fact, we can find $\varepsilon >0$ such that
(8)
\[0<J_{\mathcal{G}_{f,g},g}(x)<J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}(x)\hspace{1em}\text{for}\hspace{2.5pt}x\in (\mu _{f}-\eta ,\mu _{f}+\eta )\setminus \{\mu _{f}\}.\]We can repeat the same argument to say that $\{\frac{{\bar{Y}_{n}^{f,g_{\diamond }}}-\mu _{g}}{{\bar{Y}_{n}^{f,g_{\diamond }}}}:n\ge 1\}$ converges to $\mu _{f}$ (as $n\to \infty $) faster than $\{\bar{X}_{n}:n\ge 1\}$ in Lemma 1. In fact, we have ${V_{0}^{f,g_{\diamond }}}=\mu _{g}$ almost surely, $\mu _{g}$ is an integer, and, since $\mu _{g}>0$ because $q_{0}=0$, we have $\mu _{g}\ge 1$; then we have
\[J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}(x)=\mu _{g}\cdot \frac{I_{f}(x)}{1-x}>I_{f}(x)>0\hspace{1em}\text{for all}\hspace{2.5pt}x\in (0,1)\setminus \{\mu _{f}\}\]
(we can also consider the case $x=0$ if $\mu _{g}>1$).Now we present the LDP for the second sequence of estimators.
Proposition 4.
Assume the same hypotheses of Proposition 2 and $q_{0}=0$. Let $\{Y_{n}:n\ge 1\}$ be i.i.d. random variables distributed as ${Y}^{f,g}$. Let $\{\bar{Y}_{n}:n\ge 1\}$ be the sequence of empirical means defined by $\bar{Y}_{n}:=\frac{1}{n}{\sum _{k=1}^{n}}Y_{k}$ (for all $n\ge 1$). Then $\{\frac{\bar{Y}_{n}-\mu _{g}}{\bar{Y}_{n}}:n\ge 1\}$ satisfies the LDP with good rate function $J_{\mu _{g}}$ defined by
Proof.
By Proposition 2 and the contraction principle we have the LDP of $\{\frac{\bar{Y}_{n}-\mu _{g}}{\bar{Y}_{n}}:n\ge 1\}$ with good rate function $J_{\mu _{g}}$ defined by
\[J_{\mu _{g}}(x):=\inf \bigg\{I_{\mathcal{G}_{f,g},g}(y,z):y\ge z>0,\frac{y-\mu _{g}}{y}=x\bigg\}.\]
The case $x\ge 1$ is trivial because we have the infimum over the empty set (we recall that $\mu _{g}>0$ because $q_{0}=0$). For $x<1$, we have
\[J_{\mu _{g}}(x)=\inf \bigg\{I_{\mathcal{G}_{f,g},g}\bigg(\frac{\mu _{g}}{1-x},z\bigg):z>0\bigg\},\]
and we obtain the desired formula by taking into account the expression of the rate function $I_{\mathcal{G}_{f,g},g}$ in Proposition 2. □Remark 5 (We can have $J_{\mu _{g}}(x)<\infty $ for some $x<0$).
We know that, for $J_{\mathcal{G}_{f,g},g}$ in Proposition 3, we have $J_{\mathcal{G}_{f,g},g}(x)=\infty $ for $x\notin [0,1)$. On the contrary, as we see, we could have $J_{\mu _{g}}(x)<\infty $ for some $x<0$. In order to explain this fact, we denote the minimum value r such that $q_{r}>0$ by $r_{\mathrm{min}}$; then we have $\mu _{g}\ge r_{\mathrm{min}}$; moreover, we have $\mu _{g}>r_{\mathrm{min}}$ if $q_{r_{\mathrm{min}}}<1$. In conclusion, we can say that if $\mu _{g}>r_{\mathrm{min}}$, then the range of negative values x such that $J_{\mu _{g}}(x)<\infty $ is
in fact, for $x<1$, both $I_{f}(\frac{\frac{\mu _{g}}{1-x}-z}{\frac{\mu _{g}}{1-x}})$ and $I_{g}(z)$ are finite for $z\in [r_{\mathrm{min}},\frac{\mu _{g}}{1-x}]$, and therefore we can say that $J_{\mu _{g}}(x)<\infty $ if $r_{\mathrm{min}}\le \frac{\mu _{g}}{1-x}$ or, equivalently, if (9) holds.
Remark 6 (Estimators of $\mu _{f}$ when $\mu _{f}=0$).
If $\mu _{f}=0$, that is, $f(s)=1$ for all s or, equivalently, $p_{0}=1$, then the rate function in Proposition 3 is
Note the rate function in (10) can also be derived by combining the contraction principle and the rate function $I_{g}$ for the empirical means $\{\bar{Z}_{n}:n\ge 1\}$; in fact, we have $\{\frac{\bar{Y}_{n}-\mu _{g}}{\bar{Y}_{n}}:n\ge 1\}=\{\frac{\bar{Z}_{n}-\mu _{g}}{\bar{Z}_{n}}:n\ge 1\}$, and the rate function $I_{g}$ is good by the hypotheses of Proposition 4 (see Proposition 2 and Remark 3). Finally, we also note that inequality (9) appears in the rate function expression (10).
\[J_{\mathcal{G}_{f,g},g}(x)=\left\{\begin{array}{l@{\hskip10.0pt}l}0& \hspace{2.5pt}\text{if}\hspace{2.5pt}x=0,\\{} \infty & \hspace{2.5pt}\text{otherwise}.\end{array}\right.\]
Then it is easy to check that $J_{\mathcal{G}_{f,g},g}$ coincides with $I_{f}$, and therefore $J_{\mathcal{G}_{f,g},g}$ coincides with $J_{\mathcal{G}_{f,g_{\diamond }},g_{\diamond }}$ in (7) (note that, in particular, we cannot have the strict inequalities in (8) in Remark 4 stated for the case $\mu _{f}>0$). Finally, if $\mu _{f}=0$ (and as usual $q_{0}=0$ or, equivalently, $\mu _{g}>0$), then we have $z=\frac{\mu _{g}}{1-x}$ in the variational formula of the rate function in Proposition 4, and therefore