1 Introduction
Different financial institutions that issue loans do this following company-specific (and/or country-defined) rules which act as a safeguard against loans issued to people who are known to be insolvent. However, striving for higher profits might motivate some companies to issue loans to higher risk clients. Usually company’s methods for evaluating loan risk are not publicly available. However, one way to evaluate if there aren’t too many knowingly very high-risk loans issued, and if insolvent clients are adequately separated from responsible clients, would be to look at the quantity of defaulted and non-defaulted loans issued each day. The adequacy of company’s rules for issuing loans can be analysed by modelling via copulas the dependence between the number of defaulted loans and the number of non-defaulted loans. The advantage of such approach is that copulas allow to model the marginal distributions (possibly from different distribution families) and their dependence structure (which is described via a copula) separately. Because of this feature, copulas were applied to many different fields, including survival analysis, hydrology, insurance risk analysis as well as finance (for examples of copula applications, see [3] or [4]), which also included the analysis of loans and their default rates.
The dependence of the default rate of loans on different credit risk categories was analysed in [5]. To model the dependence, copulas from ten different families were applied and three model selection tests were carried out. Because of the small sample size (24 observations per risk category) most of the copula families were not rejected and a single best copula model was not selected. To analyse whether dependence is affected by time, Fenech et al. [6] estimated the dependence among four different loan default indexes before the global financial crisis and after. They have found that the dependence was different in these periods. Four copula families were used to estimate the dependence between the default index pairs. While these studies were carried out for continuous data, discrete models created with copulas are less investigated: Genest and Nešlehová [8] discussed the differences and challenges of using copulas for discrete data compared to continuous data. Note that the previously mentioned studies assumed that the data does not depend on its own previous values. By using bivariate integer-valued autoregressive models (BINAR) it is possible to account for both the discreteness and autocorrelation of the data. Furthermore, copulas can be used to model the dependence of innovations in the BINAR(1) models: Karlis and Pedeli [10] used the Frank copula and the normal copula to model the dependence of the innovations of the BINAR(1) model.
In this paper we expand on using copulas in BINAR models by analysing additional copula families for the innovations of the BINAR(1) model and analyse different methods for BINAR(1) model parameter estimation. We also present a two-step method for the parameter estimation of the BINAR(1) model, where we estimate the model parameters separately from the dependence parameter of the copula. These estimation methods (including the one used in [10]) are compared via Monte Carlo simulations. Finally, in order to analyse the presence of autocorrelation and copula dependence in loan data, an empirical application is carried out for empirical weekly loan data.
The paper is organized as follows. Section 2 presents the BINAR(1) process and its main properties, Section 3 presents the main properties of copulas as well as some copula functions. Section 4 compares different estimation methods for the BINAR(1) model and the dependence parameter of copulas via Monte Carlo simulations. In Section 5 an empirical application is carried out using different combinations of copula functions and marginal distribution functions. Conclusions are presented in Section 6.
2 The bivariate INAR(1) process
The BINAR(1) process was introduced in [18]. In this section we will provide the definition of the BINAR(1) model and will formulate its properties.
Definition 1.
Let ${\mathbf{R}_{t}}={[{R_{1,t}},{R_{2,t}}]^{\prime }}$, $t\in \mathbb{Z}$, be a sequence of independent identically distributed (i.i.d.) nonnegative integer-valued bivariate random variables. A bivariate integer-valued autoregressive process of order 1 (BINAR(1)), ${\mathbf{X}_{t}}={[{X_{1,t}},{X_{2,t}}]^{\prime }}$, $t\in \mathbb{Z}$, is defined as:
where ${\alpha _{j}}\in [0,1)$, $j=1,2$, and the symbol ‘∘’ is the thinning operator which also acts as the matrix multiplication. So the jth ($j=1,2$) element is defined as an INAR process of order 1 (INAR(1)):
where ${\alpha _{j}}\circ {X_{j,t-1}}:={\sum _{i=1}^{{X_{j,t-1}}}}{Y_{j,t,i}}$ and ${Y_{j,t,1}},{Y_{j,t,2}},\dots \hspace{0.1667em}$ is a sequence of i.i.d. Bernoulli random variables with $\mathbb{P}({Y_{j,t,i}}=1)={\alpha _{j}}=1-\mathbb{P}({Y_{j,t,i}}=0)$, ${\alpha _{j}}\in [0,1)$, such that these sequences are mutually independent and independent of the sequence ${\mathbf{R}_{t}}$, $t\in \mathbb{Z}$. For each t, ${\mathbf{R}_{t}}$ is independent of ${\mathbf{X}_{s}}$, $s\mathrm{<}t$.
(1)
\[ {\mathbf{X}_{t}}=\mathbf{A}\circ {\mathbf{X}_{t-1}}+{\mathbf{R}_{t}}=\left[\begin{array}{c@{\hskip10.0pt}c}{\alpha _{1}}& 0\\ {} 0& {\alpha _{2}}\end{array}\right]\circ \left[\begin{array}{c}{X_{1,t-1}}\\ {} {X_{2,t-1}}\end{array}\right]+\left[\begin{array}{c}{R_{1,t}}\\ {} {R_{2,t}}\end{array}\right],\hspace{1em}t\in \mathbb{Z},\]Properties of the thinning operator are provided in [17] and [19] with proofs for selected few. We present the main properties of the thinning operator which will be used later on in the case of BINAR(1) model. Denote by ‘$\stackrel{d}{=}$’ the equality of distributions.
Theorem 1 (Thinning operator properties).
Let $X,{X_{1}},{X_{2}}$ be nonnegative integer-valued random variables, such that $\mathbb{E}{Z^{2}}\mathrm{<}\infty $, $Z\in \{X,{X_{1}},{X_{2}}\}$, $\alpha ,{\alpha _{1}},{\alpha _{2}}\in [0,1)$ and let ‘∘’ be the thinning operator. Then the following properties hold:
-
(a) ${\alpha _{1}}\circ ({\alpha _{2}}\circ X)\stackrel{d}{=}({\alpha _{1}}{\alpha _{2}})\circ X$;
-
(b) $\alpha \circ ({X_{1}}+{X_{2}})\stackrel{d}{=}\alpha \circ {X_{1}}+\alpha \circ {X_{2}}$;
-
(c) $\mathbb{E}(\alpha \circ X)=\alpha \mathbb{E}(X)$;
-
(d) $\mathbb{V}\mathrm{ar}(\alpha \circ X)={\alpha ^{2}}\mathbb{V}\mathrm{ar}(X)+\alpha (1-\alpha )\mathbb{E}(X)$;
-
(e) $\mathbb{E}((\alpha \circ {X_{1}}){X_{2}})=\alpha \mathbb{E}({X_{1}}{X_{2}})$;
-
(f) $\mathbb{C}\mathrm{ov}(\alpha \circ {X_{1}},{X_{2}})=\alpha \mathbb{C}\mathrm{ov}({X_{1}},{X_{2}})$;
-
(g) $\mathbb{E}(({\alpha _{1}}\circ {X_{1}})({\alpha _{2}}\circ {X_{2}}))={\alpha _{1}}{\alpha _{2}}\mathbb{E}({X_{1}}{X_{2}})$.
${X_{j,t}}$, defined in eq. (2), has two random components: the survivors of the elements of the process at time $t-1$, each with the probability of survival ${\alpha _{j}}$, which are denoted by ${\alpha _{j}}\circ {X_{j,t-1}}$, and the elements which enter in the system in the interval $(t-1,t]$, which are called arrival elements and denoted by ${R_{j,t}}$. We can obtain a moving average representation by substitutions and the properties of the thinning operator as in [1] or [11, p. 180]:
where convergence on the right-hand side holds a.s.
Now we present some properties of the BINAR(1) model. They will be used when analysing some of parameter estimation methods. The proofs for these properties can be easily derived and some of them are provided in [17].
Theorem 2 (Properties of the BINAR(1) process).
Let ${\textbf{X}_{t}}={({X_{1,t}},{X_{2,t}})^{\prime }}$ be a nonnegative integer-valued time series given in Def. 1 and ${\alpha _{j}}\in [0,1)$, $j=1,2$. Let ${\textbf{R}_{t}}={({R_{1,t}},{R_{2,t}})^{\prime }}$, $t\in \mathbb{Z}$, be nonnegative integer-valued random variables with $\mathbb{E}({R_{j,t}})={\lambda _{j}}$ and $\mathbb{V}\mathrm{ar}({R_{j,t}})={\sigma _{j}^{2}}\mathrm{<}\infty $, $j=1,2$. Then the following properties hold:
-
(a) $\mathbb{E}{X_{j,t}}={\mu _{{X_{j}}}}=\frac{{\lambda _{j}}}{1-{\alpha _{j}}}$;
-
(b) $\mathbb{E}({X_{j,t}}|{X_{j,t-1}})={\alpha _{j}}{X_{j,t-1}}+{\lambda _{j}}$;
-
(c) $\mathbb{V}\mathrm{ar}({X_{j,t}})={\sigma _{{X_{j}}}^{2}}=\frac{{\sigma _{j}^{2}}+{\alpha _{j}}{\lambda _{j}}}{1-{\alpha _{j}^{2}}}$;
-
(d) $\mathbb{C}\mathrm{ov}({X_{i,t}},{R_{j,t}})=\mathbb{C}\mathrm{ov}({R_{i,t}},{R_{j,t}})$, $i\ne j$;
-
(e) $\mathbb{C}\mathrm{ov}({X_{j,t}},{X_{j,t+h}})={\alpha _{j}^{h}}{\sigma _{{X_{j}}}^{2}},\hspace{2.5pt}h\ge 0$;
-
(f) $\mathbb{C}\mathrm{orr}({X_{j,t}},{X_{j,t+h}})={\alpha _{j}^{h}}$, $h\ge 0$;
-
(g) $\displaystyle \mathbb{C}\mathrm{ov}({X_{i,t}},{X_{j,t+h}})=\frac{{\alpha _{j}^{h}}}{1-{\alpha _{i}}{\alpha _{j}}}\hspace{0.1667em}\mathbb{C}\mathrm{ov}({R_{i,t}},{R_{j,t}})$, $i\ne j$, $h\ge 0$;
-
(h) $\displaystyle \mathbb{C}\mathrm{orr}({X_{i,t+h}},{X_{j,t}})=\frac{{\alpha _{i}^{h}}\sqrt{(1-{\alpha _{i}^{2}})(1-{\alpha _{j}^{2}}})}{(1-{\alpha _{i}}{\alpha _{j}})\sqrt{({\sigma _{i}^{2}}+{\alpha _{i}}{\lambda _{i}})({\sigma _{j}^{2}}+{\alpha _{j}}{\lambda _{j}})}}\hspace{0.1667em}\mathbb{C}\mathrm{ov}({R_{i,t}},{R_{j,t}})$, $i\ne j$, $h\ge 0$;
Similarly to (3), we have that
where convergence on the right-hand side holds a.s.
Hence, the distributional properties of the BINAR(1) process can be studied in terms of ${\textbf{R}_{t}}$ values. Note also, that according to [12], if ${\alpha _{j}}\in [0,1)$, $j=1,2$, then there exists a unique stationary nonnegative integer-valued sequence ${\mathbf{X}_{t}}$, $t\in \mathbb{Z}$, satisfying (1).
From the covariance and correlation (see (g) and (h) in Theorem 2) of the BINAR(1) process we see that the dependence between ${X_{1,t}}$ and ${X_{2,t}}$ depends on the joint distribution of the innovations ${R_{1,t}}$, ${R_{2,t}}$. Pedeli and Karlis [18] analysed BINAR(1) models when the innovations were linked by either a bivariate Poisson or a bivariate negative binomial distribution, where the covariance of the innovations can be easily expressed in terms of their joint distribution parameters. Karlis and Pedeli [10] analysed two cases when the distributions of innovations of a BINAR(1) model are linked by either the Frank copula or a normal copula with either Poisson or negative binomial marginal distributions. We will expand their work by analysing additional copulas for the BINAR(1) model innovation distribution as well as estimation methods for the distribution parameters.
3 Copulas
In this section we recall the definition and main properties of bivariate copulas, mainly following [8, 15] and [21] for the continuous and discrete settings.
3.1 Copula definition and properties
Copulas are used for modelling the dependence between several random variables. The main advantage of using copulas is that they allow to model the marginal distributions separately from their joint distribution. In this paper we are using two-dimensional copulas which are defined as follows:
Definition 2.
The theoretical foundation of copulas is given by Sklar’s theorem:
Theorem 3 ([20]).
Let H be a joint cumulative distribution function (cdf) with marginal distributions ${F_{1}},{F_{2}}$. Then there exists a copula C such that for all $({x_{1}},{x_{2}})\in {[-\infty ,\infty ]^{2}}$:
If ${F_{i}}$ is continuous for $i=1,2$ then C is unique; otherwise C is uniquely determined only on $\mathrm{Ran}({F_{1}})\times \mathrm{Ran}({F_{2}})$, where $\mathrm{Ran}(F)$ denotes the range of the cdf F. Conversely, if C is a copula and ${F_{1}},{F_{2}}$ are distribution functions, then the function H, defined by equation (7) is a joint cdf with marginal distributions ${F_{1}},{F_{2}}$.
If a pair of random variables $({X_{1}},{X_{2}})$ has continuous marginal cdfs ${F_{i}}(x),i=1,2$, then by applying the probability integral transformation one can transform them into random variables $({U_{1}},{U_{2}})=({F_{1}}({X_{1}}),{F_{2}}({X_{2}}))$ with uniformly distributed marginals which can then be used when modelling their dependence via a copula. More about Copula theory, properties and applications can be found in [15] and [9].
3.2 Copulas with discrete marginal distributions
Since innovations of a BINAR(1) model are nonnegative integer-valued random variables, one needs to consider copulas linking discrete distributions. In this section we will mention some of the key differences when copula marginals are discrete rather than continuous.
Firstly, as mentioned in Theorem 3, if ${F_{1}}$ and ${F_{2}}$ are discrete marginals then a unique copula representation exists only for values in the range of $\mathrm{Ran}({F_{1}})\times \mathrm{Ran}({F_{2}})$. However, the lack of uniqueness does not pose a problem in empirical applications because it implies that there may exist more than one copula which describes the distribution of the empirical data. Secondly, regarding concordance and discordance, the discrete case has to allow for ties (i.e. when two variables have the same value), so the concordance measures (Spearman’s rho and Kendal’s tau) are margin-dependent, see [21]. There are several modifications proposed for Spearman’s rho, however, none of them are margin-free. Furthermore, Genest and Nešlehová [8] state that estimators of the dependence parameter θ based on Kendall’s tau or its modified versions are biased, and estimation techniques based on maximum likelihood are recommended. As such, we will not examine estimation methods based on concordance measures. Another difference from the continuous case is the use of the probability mass function (pmf) instead of the probability density function when estimating the model parameters which will be seen in Section 4.
3.3 Some concrete copulas
In this section we will present several bivariate copulas, which will be used later when constructing and evaluating the BINAR(1) model. For all the copulas discussed, the following notation is used: ${u_{1}}:={F_{1}}({x_{1}})$, ${u_{2}}:={F_{2}}({x_{2}})$, where ${F_{1}},{F_{2}}$ are marginal cumulative distribution functions (cdfs) of discrete random variables, and θ is the dependence parameter.
Farlie–Gumbel–Morgenstern copula
The Farlie–Gumbel–Morgenstern (FGM) copula has the following form:
The dependence parameter θ can take values from the interval $[-1,1]$. If $\theta =0$, then the FGM copula collapses to independence. Note that the FGM copula can only model weak dependence between two marginals (see [15]). The copula when $\theta =0$ is called a product (or independence) copula:
Since the product copula corresponds to independence, it is important as a benchmark.
(8)
\[\begin{aligned}{}C({u_{1}},{u_{2}};\theta )& ={u_{1}}{u_{2}}\big(1+\theta (1-{u_{1}})(1-{u_{2}})\big).\end{aligned}\]Frank copula
The Frank copula has the following form:
\[\begin{aligned}{}C({u_{1}},{u_{2}};\theta )& =-\frac{1}{\theta }\log \bigg(1+\frac{(\exp (-\theta {u_{1}})-1)(\exp (-\theta {u_{2}})-1)}{\exp (-\theta )-1}\bigg).\end{aligned}\]
The dependence parameter θ can take values from $(-\infty ,\infty )\setminus \{0\}$. The Frank copula allows for both positive and negative dependence between the marginals.Clayton copula
The Clayton copula has the following form:
with the dependence parameter $\theta \in [-1,\infty )\setminus \{0\}$. The marginals become independent when $\theta \to 0$. It can be used when the correlation between two random variables exhibits a strong left tail dependence – if smaller values are strongly correlated and hight values are less correlated. The Clayton copula can also account for negative dependence when $\theta \in [-1,0)$. For more properties of this copula, see the recent paper by Manstavičius and Leipus [14].
(10)
\[\begin{aligned}{}C({u_{1}},{u_{2}};\theta )& =\max {\big\{{u_{1}^{-\theta }}+{u_{2}^{-\theta }}-1,0\big\}^{-\frac{1}{\theta }}},\end{aligned}\]4 Parameter estimation of the copula-based BINAR(1) model
In this section we examine different BINAR(1) model parameter estimation methods and provide a two-step method for separate estimation of the copula dependence parameter. Estimation methods are compared via Monte Carlo simulations. Let ${\textbf{X}_{t}}={({X_{1,t}},{X_{2,t}})^{\prime }}$ be a non-negative integer-valued time series given in Def. 1, where the joint distribution of ${({R_{1,t}},{R_{2,t}})^{\prime }}$, with marginals ${F_{1}},{F_{2}}$, is linked by a copula $C(\cdot ,\cdot )$:
\[\begin{aligned}{}\mathbb{P}({R_{1,t}}\le {x_{1}},{R_{2,t}}\le {x_{2}})& =C\big({F_{1}}({x_{1}}),{F_{2}}({x_{2}})\big)\end{aligned}\]
and let $C({u_{1}},{u_{2}})=C({u_{1}},{u_{2}};\theta )$, where θ is a dependence parameter.4.1 Conditional least squares estimation
The Conditional least squares (CLS) estimator minimizes the squared distance between ${\textbf{X}_{t}}$ and its conditional expectation. Similarly to the method in [19] for the INAR(1) model, we construct the CLS estimator in the case of the BINAR(1) model.
Using Theorem 1 we can write the vector of conditional means as
where ${\lambda _{j}}\hspace{0.1667em}:=\hspace{0.1667em}\mathbb{E}{R_{j,t}}$, $j\hspace{0.1667em}=\hspace{0.1667em}1,2$. In order to calculate the CLS estimators of $({\alpha _{1}},{\alpha _{2}},{\lambda _{1}},{\lambda _{2}})$ we define the vector of residuals as the difference between the observations and their conditional expectation:
and
The asymptotic properties of the CLS estimators for the INAR(1) model case are provided in [13, 19, 2] and can be applied to the BINAR(1) parameter estimates, specified via equations (12) and (13). By the fact that the j-th component of the BINAR(1) process is an INAR(1) itself, we can formulate the following theorem for the marginal parameter vector distributions (see [2]):
(11)
\[ {\boldsymbol{\mu }_{t|t-1}}:=\left[\begin{array}{c}\mathbb{E}({X_{1,t}}|{X_{1,t-1}})\\ {} \mathbb{E}({X_{2,t}}|{X_{2,t-1}})\end{array}\right]=\left[\begin{array}{c}{\alpha _{1}}{X_{1,t-1}}+{\lambda _{1}}\\ {} {\alpha _{2}}{X_{2,t-1}}+{\lambda _{2}}\end{array}\right],\]
\[\begin{aligned}{}{\textbf{X}_{t}}-{\boldsymbol{\mu }_{t|t-1}}& =\left[\begin{array}{c}{X_{1,t}}-{\alpha _{1}}{X_{1,t-1}}-{\lambda _{1}}\\ {} {X_{2,t}}-{\alpha _{2}}{X_{2,t-1}}-{\lambda _{2}}\end{array}\right].\end{aligned}\]
Then, given a sample of N observations, ${\textbf{X}_{1}},\dots ,{\textbf{X}_{N}}$, the CLS estimators of ${\alpha _{j}},{\lambda _{j}}$, $j=1,2$, are found by minimizing the sum
\[\begin{aligned}{}{Q_{j}}({\alpha _{j}},{\lambda _{j}})& :={\sum \limits_{t=2}^{N}}{({X_{j,t}}-{\alpha _{j}}{X_{j,t-1}}-{\lambda _{j}})^{2}}\hspace{2.5pt}\longrightarrow \hspace{2.5pt}\underset{{\alpha _{j}},{\lambda _{j}}}{\min },\hspace{1em}j=1,2.\end{aligned}\]
By taking the derivatives with respect to ${\alpha _{j}}$ and ${\lambda _{j}}$, $j=1,2$, and equating them to zero we get:
(12)
\[ {\hat{\alpha }_{j}^{\mathrm{CLS}}}=\frac{{\textstyle\sum _{t=2}^{N}}({X_{j,t}}-{\bar{X}_{j}})({X_{j,t-1}}-{\bar{X}_{j}})}{{\textstyle\sum _{t=2}^{N}}{({X_{j,t-1}}-{\bar{X}_{j}})^{2}}}\](13)
\[\begin{aligned}{}{\hat{\lambda }_{j}^{\mathrm{CLS}}}& =\frac{1}{N-1}\Bigg({\sum \limits_{t=2}^{N}}{X_{j,t}}-{\hat{\alpha }_{j}^{\mathrm{CLS}}}{\sum \limits_{t=2}^{N}}{X_{j,t-1}}\Bigg).\end{aligned}\]Theorem 4.
Let ${\mathbf{X}_{t}}={({X_{1,t}},{X_{2,t}})^{\prime }}$ be defined in Def. 1 and let the parameter vector of (2) be ${({\alpha _{j}},{\lambda _{j}})^{\prime }}$. Assume that ${\widehat{\alpha }_{j}^{\mathrm{CLS}}}$ and ${\widehat{\lambda }_{j}^{\mathrm{CLS}}}$ are the CLS estimators of ${\alpha _{j}}$ and ${\lambda _{j}}$, $j=1,2$. Then:
\[ \sqrt{N}\left(\begin{array}{c}{\widehat{\alpha }_{j}^{\mathrm{CLS}}}-{\alpha _{j}}\\ {} {\widehat{\lambda }_{j}^{\mathrm{CLS}}}-{\lambda _{j}}\end{array}\right)\stackrel{d}{\longrightarrow }\mathcal{N}({\mathbf{0}_{2}},{\mathbf{B}_{j}}),\]
where
\[\begin{aligned}{}{\mathbf{B}_{j}}& ={\left[\begin{array}{c@{\hskip10.0pt}c}\mathbb{E}{X_{j,t}^{2}}& \mathbb{E}{X_{j,t}}\\ {} \mathbb{E}{X_{j,t}}& 1\end{array}\right]^{-1}}{\mathbf{A}_{j}}{\left[\begin{array}{c@{\hskip10.0pt}c}\mathbb{E}{X_{j,t}^{2}}& \mathbb{E}{X_{j,t}}\\ {} \mathbb{E}{X_{j,t}}& 1\end{array}\right]^{-1}},\\ {} {\mathbf{A}_{j}}& ={\alpha _{j}}(1-{\alpha _{j}})\left[\begin{array}{c@{\hskip10.0pt}c}\mathbb{E}{X_{j,t}^{3}}& \mathbb{E}{X_{j,t}^{2}}\\ {} \mathbb{E}{X_{j,t}^{2}}& \mathbb{E}{X_{j,t}}\end{array}\right]+{\sigma _{j}^{2}}\left[\begin{array}{c@{\hskip10.0pt}c}\mathbb{E}{X_{j,t}^{2}}& \mathbb{E}{X_{j,t}}\\ {} \mathbb{E}{X_{j,t}}& 1\end{array}\right],\hspace{1em}j=1,2.\end{aligned}\]
Here, according to BINAR(1) properties in Theorem 2,
\[\begin{aligned}{}\mathbb{E}{X_{j,t}}=& \frac{{\lambda _{j}}}{1-{\alpha _{j}}},\hspace{2.5pt}\hspace{2.5pt}\mathbb{E}{X_{j,t}^{2}}=\frac{{\sigma _{j}^{2}}+{\alpha _{j}}{\lambda _{j}}}{1-{\alpha _{j}^{2}}}+\frac{{\lambda _{j}^{2}}}{{(1-{\alpha _{j}})^{2}}},\\ {} \mathbb{E}{X_{j,t}^{3}}=& \frac{\mathbb{E}{R_{j,t}^{3}}-3{\sigma _{j}^{2}}(1+{\lambda _{j}})-{\lambda _{j}^{3}}+2{\lambda _{j}}}{1-{\alpha _{j}^{3}}}+3\frac{{\sigma _{j}^{2}}+{\alpha _{j}}{\lambda _{j}}}{1-{\alpha _{j}^{2}}}-2\frac{{\lambda _{j}}}{1-{\alpha _{j}}}\\ {} & +3\frac{{\lambda _{j}}({\sigma _{j}^{2}}+{\alpha _{j}}{\lambda _{j}})}{(1-{\alpha _{j}})(1-{\alpha _{j}^{2}})}+\frac{{\lambda _{j}^{3}}}{{(1-{\alpha _{j}})^{3}}}.\end{aligned}\]
For the Poisson marginal distribution case the asymptotic variance matrix can be expressed as (see [7])
\[ {\mathbf{B}_{j}}=\left[\begin{array}{c@{\hskip10.0pt}c}\frac{{\alpha _{j}}{(1-{\alpha _{j}})^{2}}}{{\lambda _{j}}}+1-{\alpha _{j}^{2}}& -(1+{\alpha _{j}}){\lambda _{j}}\\ {} -(1+{\alpha _{j}}){\lambda _{j}}& {\lambda _{j}}+\frac{1+{\alpha _{j}}}{1-{\alpha _{j}}}{\lambda _{j}^{2}}\end{array}\right],\hspace{1em}j=1,2.\]
Furthermore, for a more general case, [12] proved that the CLS estimators of a multivariate generalized integer-valued autoregressive process (GINAR) are asymptotically normally distributed.Note that
which follows from
(14)
\[\begin{aligned}{}\mathbb{E}({X_{1,t}}-{\alpha _{1}}{X_{1,t-1}}-{\lambda _{1}})({X_{2,t}}-{\alpha _{2}}{X_{2,t-1}}-{\lambda _{2}})& =\mathbb{C}\mathrm{ov}({R_{1,t}},{R_{2,t}}),\end{aligned}\]
\[\begin{aligned}{}& \mathbb{E}({X_{1,t}}-{\alpha _{1}}{X_{1,t-1}}-{\lambda _{1}})({X_{2,t}}-{\alpha _{2}}{X_{2,t-1}}-{\lambda _{2}})\\ {} & \hspace{1em}=\mathbb{E}({\alpha _{1}}\circ {X_{1,t-1}}-{\alpha _{1}}{X_{1,t-1}})({\alpha _{2}}\circ {X_{2,t-1}}-{\alpha _{2}}{X_{2,t-1}})\\ {} & \hspace{2em}+\mathbb{E}({\alpha _{1}}\circ {X_{1,t-1}}-{\alpha _{1}}{X_{1,t-1}})({R_{2,t}}-{\lambda _{2}})\\ {} & \hspace{2em}+\mathbb{E}({\alpha _{2}}\circ {X_{2,t-1}}-{\alpha _{2}}{X_{2,t-1}})({R_{1,t}}-{\lambda _{1}})\\ {} & \hspace{2em}+\mathbb{E}({R_{1,t}}-{\lambda _{1}})({R_{2,t}}-{\lambda _{2}})\end{aligned}\]
since the first three summands are zeros.Example 4.1.
Assume that the joint pmf of $({R_{1,t}},{R_{2,t}})$ is given by bivariate Poisson distribution:
\[\begin{aligned}{}\mathbb{P}({R_{1,t}}=k,{R_{2,t}}=l)& ={\sum \limits_{i=0}^{\min \{k,l\}}}\frac{{({\lambda _{1}}-\lambda )^{k-i}}{({\lambda _{2}}-\lambda )^{l-i}}{\lambda ^{i}}}{(k-i)!(l-i)!i!}\hspace{0.1667em}{\mathrm{e}^{-({\lambda _{1}}+{\lambda _{2}}-\lambda )}},\end{aligned}\]
where $k,l=0,1,...$, ${\lambda _{j}}\mathrm{>}0$, $j=1,2$, $0\le \lambda \mathrm{<}\min \{{\lambda _{1}},{\lambda _{2}}\}$. Then, for each $j=1,2$, the marginal distribution of ${R_{j,t}}$ is Poisson with parameter ${\lambda _{j}}$ and $\mathbb{C}\mathrm{ov}({R_{1,t}},{R_{2,t}})=\lambda $. If $\lambda =0$ then the two variables are independent.Example 4.2.
Assume that the joint pmf of $({R_{1,t}},{R_{2,t}})$ is bivariate negative binomial distribution given by
\[\begin{aligned}{}\mathbb{P}({R_{1,t}}=k,{R_{2,t}}=l)=& \frac{\varGamma (\beta +k+l)}{\varGamma (\beta )k!l!}{\bigg(\frac{{\lambda _{1}}}{{\lambda _{1}}+{\lambda _{2}}+\beta }\bigg)^{k}}{\bigg(\frac{{\lambda _{2}}}{{\lambda _{1}}+{\lambda _{2}}+\beta }\bigg)^{l}}\\ {} & \times {\bigg(\frac{\beta }{{\lambda _{1}}+{\lambda _{2}}+\beta }\bigg)^{\beta }},\end{aligned}\]
where $k,l=0,1,...$, ${\lambda _{j}}\mathrm{>}0$, $j=1,2$, $\beta \mathrm{>}0$. Then, for each $j=1,2$, the marginal distribution of ${R_{j,t}}$ is negative binomial with parameters β and ${p_{j}}=\beta /({\lambda _{j}}+\beta )$ and $\mathbb{E}{R_{j,t}}={\lambda _{j}}$, $\mathbb{V}\mathrm{ar}({R_{j,t}})={\lambda _{j}}(1+{\beta ^{-1}}{\lambda _{j}})$, $\mathbb{C}\mathrm{ov}({R_{1,t}},{R_{2,t}})={\beta ^{-1}}{\lambda _{1}}{\lambda _{2}}$. Thus, bivariate negative binomial distribution is more flexible than bivariate Poisson due to overdispersion parameter β.Assume now that the Poisson innovations ${R_{1,t}}$ and ${R_{2,t}}$ with parameters ${\lambda _{1}}$ and ${\lambda _{2}}$, respectively, are linked by a copula with the dependence parameter θ. Taking into account equality (14), we can estimate θ by minimizing the sum of squared differences
where
(15)
\[\begin{aligned}{}S& ={\sum \limits_{t=2}^{N}}{\big({R_{1,t}^{\mathrm{CLS}}}{R_{2,t}^{\mathrm{CLS}}}-\gamma \big({\hat{\lambda }_{1}^{\mathrm{CLS}}},{\hat{\lambda }_{2}^{\mathrm{CLS}}};\theta \big)\big)^{2}},\end{aligned}\]
\[\begin{aligned}{}{R_{j,t}^{\mathrm{CLS}}}& :={X_{j,t}}-{\hat{\alpha }_{j}^{\mathrm{CLS}}}{X_{j,t-1}}-{\hat{\lambda }_{j}^{\mathrm{CLS}}},\hspace{1em}j=1,2,\\ {} \gamma ({\lambda _{1}},{\lambda _{2}};\theta )& :=\mathbb{C}\mathrm{ov}({R_{1,t}},{R_{2,t}})\hspace{2.5pt}={\sum \limits_{k,l=1}^{\infty }}kl\hspace{0.1667em}c\big({F_{1}}(k;{\lambda _{1}}),{F_{2}}(l;{\lambda _{2}});\theta \big)-{\lambda _{1}}{\lambda _{2}}.\end{aligned}\]
Here, $c({F_{1}}(k;{\lambda _{1}}),{F_{2}}(s;{\lambda _{2}});\theta )$ is the joint pmf:
(16)
\[\begin{aligned}{}c\big({F_{1}}(k;{\lambda _{1}}),{F_{2}}(l;{\lambda _{2}});\theta \big)=& \mathbb{P}({R_{1,t}}=k,{R_{2,t}}=l)\\ {} =& C\big({F_{1}}(k;{\lambda _{1}}),{F_{2}}(s;{\lambda _{2}});\theta \big)\\ {} & -C\big({F_{1}}(k-1;{\lambda _{1}}),{F_{2}}(l;{\lambda _{2}});\theta \big)\\ {} & -\hspace{2.5pt}C\big({F_{1}}(k;{\lambda _{1}}),{F_{2}}(l-1;{\lambda _{2}});\theta \big)\\ {} & +\hspace{2.5pt}C\big({F_{1}}(k-1;{\lambda _{1}}),{F_{2}}(l-1;{\lambda _{2}});\theta \big),\hspace{1em}k\ge 1,l\ge 1.\end{aligned}\]Our estimation method is based on the approximation of covariance $\gamma ({\hat{\lambda }_{1}^{\mathrm{CLS}}},{\hat{\lambda }_{2}^{\mathrm{CLS}}};\theta )$ by
For example, if the marginals are Poisson with parameters ${\lambda _{1}}={\lambda _{2}}=1$ and their joint distribution is given by the FGM copula in (8), then the covariance ${\gamma ^{({M_{1}},{M_{2}})}}(1,1;\theta )$ stops changing significantly after setting ${M_{1}}={M_{2}}=M=8$, regardless of the selected dependence parameter θ. We used this approximation methodology when carrying out a Monte Carlo simulation in Section 4.4.
(17)
\[\begin{aligned}{}{\gamma ^{({M_{1}},{M_{2}})}}\big({\hat{\lambda }_{1}^{\mathrm{CLS}}},{\hat{\lambda }_{2}^{\mathrm{CLS}}};\theta \big)& ={\sum \limits_{k=1}^{{M_{1}}}}{\sum \limits_{l=1}^{{M_{2}}}}kl\hspace{0.1667em}c\big({F_{1}}\big(k;{\hat{\lambda }_{1}^{\mathrm{CLS}}}\big),{F_{2}}\big(l;{\hat{\lambda }_{2}^{\mathrm{CLS}}}\big);\theta \big)-{\hat{\lambda }_{1}^{\mathrm{CLS}}}{\hat{\lambda }_{2}^{\mathrm{CLS}}}.\end{aligned}\]For the FGM copula, if we take the derivative of the sum
equate it to zero and use equation (17), we get
where ${F_{j,k}}:={F_{j}}(k;{\hat{\lambda }_{j}^{\mathrm{CLS}}})$, ${\overline{F}_{j,k}}:=1-{F_{j,k}}$, $j=1,2$. The derivation of equation (19) is straightforward and thus omitted.
(18)
\[\begin{aligned}{}{S^{({M_{1}},{M_{2}})}}& ={\sum \limits_{t=2}^{N}}{\big({R_{1,t}^{\mathrm{CLS}}}{R_{2,t}^{\mathrm{CLS}}}-{\gamma ^{({M_{1}},{M_{2}})}}\big({\hat{\lambda }_{1}^{\mathrm{CLS}}},{\hat{\lambda }_{2}^{\mathrm{CLS}}};\theta \big)\big)^{2}},\end{aligned}\](19)
\[ {\hat{\theta }^{\mathrm{FGM}}}\hspace{0.1667em}=\hspace{0.1667em}\frac{{\textstyle\sum _{t=2}^{N}}({X_{1,t}}-{\hat{\alpha }_{1}^{\mathrm{CLS}}}{X_{1,t-1}}-{\hat{\lambda }_{1}^{\mathrm{CLS}}})({X_{2,t}}-{\hat{\alpha }_{2}^{\mathrm{CLS}}}{X_{2,t-1}}-{\hat{\lambda }_{2}^{\mathrm{CLS}}})}{(N\hspace{-0.1667em}-\hspace{-0.1667em}1){\textstyle\sum _{k=1}^{{M_{1}}}}k({F_{1,k}}{\overline{F}_{1,k}}\hspace{0.1667em}-\hspace{0.1667em}{F_{1,k-1}}{\overline{F}_{1,k-1}}){\textstyle\sum _{l=1}^{{M_{2}}}}l({F_{2,l}}{\overline{F}_{2,l}}-{F_{2,l-1}}{\overline{F}_{2,l-1}})},\]Depending on the selected copula family, calculation of (16) to get the analytical expression of the estimator $\hat{\theta }$ may be difficult. However, we can use the function optim in the R statistical software to minimize (15). For other cases, where the marginal distribution has parameters other than expected value ${\lambda _{j}}$, equation (15) would need to be minimized by those additional parameters. For example, in the case of negative binomial marginals with corresponding mean ${\lambda _{j}}$ and variance ${\sigma _{j}^{2}}$, i.e. when
\[\begin{aligned}{}\mathbb{P}({R_{j,t}}=k)& =\frac{\varGamma (k+\frac{{\lambda _{j}^{2}}}{{\sigma _{j}^{2}}-{\lambda _{j}}})}{\varGamma (\frac{{\lambda _{j}^{2}}}{{\sigma _{j}^{2}}-{\lambda _{j}}})k!}{\bigg(\frac{{\lambda _{j}}}{{\sigma _{j}^{2}}}\bigg)^{\frac{{\lambda _{j}^{2}}}{{\sigma _{j}^{2}}-{\lambda _{j}}}}}{\bigg(\frac{{\sigma _{j}^{2}}-{\lambda _{j}}}{{\sigma _{j}^{2}}}\bigg)^{k}},\hspace{1em}k=0,1,\dots ,\hspace{2.5pt}j=1,2,\end{aligned}\]
the additional parameters are ${\sigma _{1}^{2}},{\sigma _{2}^{2}}$, and the minimization problem becomes
4.2 Conditional maximum likelihood estimation
BINAR(1) models can be estimated via conditional maximum likelihood (CML) (see [18] and [10]). The conditional distribution of the BINAR(1) process is:
\[\begin{aligned}{}\mathbb{P}& ({X_{1,t}}={x_{1,t}},{X_{2,t}}={x_{2,t}}|{X_{1,t-1}}={x_{1,t-1}},{X_{2,t-1}}={x_{2,t-1}})\\ {} & =\mathbb{P}({\alpha _{1}}\circ {x_{1,t-1}}+{R_{1,t}}={x_{1,t}},{\alpha _{2}}\circ {x_{2,t-1}}+{R_{2,t}}={x_{2,t}})\\ {} & ={\sum \limits_{k=0}^{{x_{1,t}}}}{\sum \limits_{l=0}^{{x_{2,t}}}}\mathbb{P}({\alpha _{1}}\circ {x_{1,t-1}}\hspace{0.1667em}=\hspace{0.1667em}k)\mathbb{P}({\alpha _{2}}\circ {x_{2,t-1}}\hspace{0.1667em}=\hspace{0.1667em}l)\mathbb{P}({R_{1,t}}\hspace{0.1667em}=\hspace{0.1667em}{x_{1,t}}-k,{R_{2,t}}\hspace{0.1667em}=\hspace{0.1667em}{x_{2,t}}-l).\end{aligned}\]
Here, ${\alpha _{j}}\circ x$ is the sum of x independent Bernoulli trials. Hence,
In the case of copula-based BINAR(1) model with Poisson marginals,
\[\begin{aligned}{}\mathbb{P}({R_{1,t}}={x_{1,t}}-k,{R_{2,t}}={x_{2,t}}-l)& =c\big({F_{1}}({x_{1,t}}-k,{\lambda _{1}}),{F_{2}}({x_{2,t}}-l,{\lambda _{2}});\theta \big).\end{aligned}\]
Thus, we obtain
\[\begin{aligned}{}\mathbb{P}& ({X_{1,t}}={x_{1,t}},{X_{2,t}}={x_{2,t}}|{X_{1,t-1}}={x_{1,t-1}},{X_{2,t-1}}={x_{2,t-1}})\\ {} & ={\sum \limits_{k=0}^{{x_{1,t}}}}{\sum \limits_{l=0}^{{x_{2,t}}}}\left(\genfrac{}{}{0pt}{}{{x_{1,t-1}}}{k}\right){\alpha _{1}^{k}}{(1-{\alpha _{1}})^{{x_{1,t-1}}-k}}\left(\genfrac{}{}{0pt}{}{{x_{2,t-1}}}{l}\right){\alpha _{2}^{l}}{(1-{\alpha _{2}})^{{x_{2,t-1}}-l}}\\ {} & \hspace{2.5pt}\hspace{2.5pt}\times c\big({F_{1}}({x_{1,t}}-k,{\lambda _{1}}),{F_{2}}({x_{2,t}}-l,{\lambda _{2}});\theta \big)\end{aligned}\]
and the log conditional likelihood function, for estimating the marginal distribution parameters ${\lambda _{1}},{\lambda _{2}}$, the probabilities of the Bernoulli trial successes ${\alpha _{1}},{\alpha _{2}}$ and the dependence parameter θ, is
\[\begin{aligned}{}\ell ({\alpha _{1}},{\alpha _{2}},{\lambda _{1}},{\lambda _{2}},\theta )={\sum \limits_{t=2}^{N}}\log \mathbb{P}(& {X_{1,t}}={x_{1,t}},{X_{2,t}}={x_{2,t}}|{X_{1,t-1}}={x_{1,t-1}},\\ {} & {X_{2,t-1}}={x_{2,t-1}})\end{aligned}\]
for some initial values ${x_{1,1}}$ and ${x_{2,1}}$. In order to estimate the unknown parameters we maximize the log conditional likelihood:
Numerical maximization is straightforward with the optim function in the R statistical software.As for the CLS estimator, in other cases, where the marginal distribution has parameters other than ${\lambda _{j}}$, equation (20) would need to be maximized by those additional parameters. The CML estimator is asymptotically normally distributed under standard regularity conditions and its variance matrix is the inverse of the Fisher information matrix [18].
4.3 Two-step estimation based on CLS and CML
Depending on the range of attainable values of the parameters and the sample size, CML maximization might take some time to compute. On the other hand, since CLS estimators of ${\alpha _{j}}$ and ${\lambda _{j}}$ are easily derived (compared to the CLS estimator of θ, which depends on the copula pmf form and needs to be numerically maximized), we can substitute the parameters of the marginal distributions in eq. (20) with CLS estimates from equations (12) and (13). Then we will only need to maximize ℓ with respect to a single dependence parameter θ for the Poisson marginal distribution case.
Summarizing, the two-step approach to estimating unknown parameters is to find
\[\begin{aligned}{}\big({\hat{\alpha }_{j}^{\mathrm{CLS}}},{\hat{\lambda }_{j}^{\mathrm{CLS}}}\big)& =\arg \min {Q_{j}}({\alpha _{j}},{\lambda _{j}}),\hspace{1em}j=1,2,\end{aligned}\]
and to take these values as given in the second step:
\[\begin{aligned}{}{\hat{\theta }^{\mathrm{CML}}}& =\arg \max \ell \big({\hat{\alpha }_{1}^{\mathrm{CLS}}},{\hat{\alpha }_{2}^{\mathrm{CLS}}},{\hat{\lambda }_{1}^{\mathrm{CLS}}},{\hat{\lambda }_{2}^{\mathrm{CLS}}},\theta \big).\end{aligned}\]
For other cases of marginal distribution, any additional parameters, other than ${\alpha _{j}}$ and ${\lambda _{j}}$ would be estimated in the second step.4.4 Comparison of estimation methods via Monte Carlo simulation
We carried out a Monte Carlo simulation 1000 times to test the estimation methods with sample size 50 and 500. The generated model was a BINAR(1) with innovations joined by either the FGM, Frank or Clayton copula with Poisson marginal distributions, as well as with marginal distributions from different families: one is a Poisson distribution and the other is a negative binomial one. Note that for the two-step method only the estimates of θ and ${\sigma _{2}^{2}}$ are included because estimated values of ${\alpha _{1}^{\mathrm{CLS}}},{\alpha _{2}^{\mathrm{CLS}}},{\lambda _{1}^{\mathrm{CLS}}},{\lambda _{2}^{\mathrm{CLS}}}$ are used in order to estimate the remaining parameters via CML.
Table 1.
Monte Carlo simulation results for a BINAR(1) model with Poisson innovations linked by the FGM, Frank or Clayton copula
Copula | Sample size | Parameter | True value | CLS | CML | Two-Step | |||
MSE | Bias | MSE | Bias | MSE | Bias | ||||
FGM | $N=50$ | ${\alpha _{1}}$ | 0.6 | 0.01874 | −0.05823 | 0.00887 | −0.01789 | – | – |
${\alpha _{2}}$ | 0.4 | 0.02033 | −0.05223 | 0.01639 | −0.02751 | – | – | ||
${\lambda _{1}}$ | 1 | 0.12983 | 0.13325 | 0.06514 | 0.03366 | – | – | ||
${\lambda _{2}}$ | 2 | 0.25625 | 0.16029 | 0.19939 | 0.07597 | – | – | ||
θ | −0.5 | 0.29789 | 0.12568 | 0.33840 | 0.07568 | 0.3311 | 0.0876 | ||
$N=500$ | ${\alpha _{1}}$ | 0.6 | 0.00147 | −0.00432 | 0.00073 | −0.00122 | – | – | |
${\alpha _{2}}$ | 0.4 | 0.00184 | −0.00505 | 0.00129 | −0.00157 | – | – | ||
${\lambda _{1}}$ | 1 | 0.01012 | 0.00968 | 0.00556 | 0.00215 | – | – | ||
${\lambda _{2}}$ | 2 | 0.02413 | 0.01843 | 0.01763 | 0.00678 | – | – | ||
θ | −0.5 | 0.04679 | 0.00668 | 0.04271 | −0.00700 | 0.04265 | −0.00443 | ||
Frank | $N=50$ | ${\alpha _{1}}$ | 0.6 | 0.02023 | −0.06039 | 0.00950 | −0.01965 | – | – |
${\alpha _{2}}$ | 0.4 | 0.02005 | −0.05251 | 0.01630 | −0.02858 | – | – | ||
${\lambda _{1}}$ | 1 | 0.13562 | 0.13536 | 0.06740 | 0.03625 | – | – | ||
${\lambda _{2}}$ | 2 | 0.25687 | 0.16392 | 0.19975 | 0.08291 | – | – | ||
θ | −1 | 1.83454 | 0.12394 | 2.05786 | 0.00860 | 1.97515 | 0.04216 | ||
$N=500$ | ${\alpha _{1}}$ | 0.6 | 0.00153 | −0.00595 | 0.00075 | −0.00249 | – | – | |
${\alpha _{2}}$ | 0.4 | 0.00181 | −0.00582 | 0.00129 | −0.00132 | – | – | ||
${\lambda _{1}}$ | 1 | 0.01033 | 0.01269 | 0.00550 | 0.00421 | – | – | ||
${\lambda _{2}}$ | 2 | 0.02442 | 0.02129 | 0.01785 | 0.00629 | – | – | ||
θ | −1 | 0.22084 | 0.01746 | 0.20138 | −0.01779 | 0.20070 | −0.01342 | ||
Clayton | $N=50$ | ${\alpha _{1}}$ | 0.6 | 0.01826 | −0.05489 | 0.00799 | −0.013295 | – | – |
${\alpha _{2}}$ | 0.4 | 0.01976 | −0.05057 | 0.01585 | −0.02427 | – | – | ||
${\lambda _{1}}$ | 1 | 0.12679 | 0.12104 | 0.06080 | 0.01743 | – | – | ||
${\lambda _{2}}$ | 2 | 0.25725 | 0.15704 | 0.19934 | 0.06499 | – | – | ||
θ | 1 | 0.71845 | 0.02621 | 0.72581 | 0.22628 | 0.62372 | 0.13283 | ||
$N=500$ | ${\alpha _{1}}$ | 0.6 | 0.00146 | −0.00518 | 0.00070 | 0.00016 | – | – | |
${\alpha _{2}}$ | 0.4 | 0.00189 | −0.00350 | 0.00120 | −0.00049 | – | – | ||
${\lambda _{1}}$ | 1 | 0.00973 | 0.01137 | 0.00513 | −0.00150 | – | – | ||
${\lambda _{2}}$ | 2 | 0.02447 | 0.01113 | 0.01707 | 0.00065 | – | – | ||
θ | 1 | 0.11578 | 0.03556 | 0.05864 | 0.04250 | 0.03199 | −0.01342 |
The results for the Poisson marginal distribution case are provided in Table 1. The results for the case when one innovation follows a Poisson distribution and the other follows a negative binomial one are provided in Table 2. The lowest MSE values of $\widehat{\theta }$ are highlighted in bold. It is worth noting that CML estimation via numerical maximization depends heavily on the initial parameter values. If the initial values are selected too low or too high from the actual value, then the global maximum may not be found. In order to overcome this, we have selected the starting values equal to the CLS parameter estimates.
As can be seen in Table 1, the estimated values of ${\alpha _{j}}$ and ${\lambda _{j}}$, $j=1,2$, have a smaller bias and MSE when parameters are estimated via CML. On the other hand, estimation of θ via CLS exhibits a smaller MSE in the Frank copula case for smaller samples. For larger samples, the estimates of θ via the Two-step estimation method are very close to the CML estimates in terms of MSE and bias, and are closer to the true parameter values than the CLS estimates. Furthermore, since in the Two-step estimation numerical maximization is only carried out via a single parameter θ, the initial parameter values have less effect on the numerical maximization.
Table 2.
Monte Carlo simulation results for a BINAR(1) model with one innovation following a Poisson distribution and the other – a negative binomial one, where both innovations are linked by the FGM, Frank or Clayton copula
Copula | Sample size | Parameter | True value | CLS | CML | Two-Step | |||
MSE | Bias | MSE | Bias | MSE | Bias | ||||
FGM | $N=50$ | ${\alpha _{1}}$ | 0.6 | 0.01895 | −0.05858 | 0.00845 | −0.01513 | – | – |
${\alpha _{2}}$ | 0.4 | 0.01936 | −0.04902 | 0.00767 | −0.01953 | – | – | ||
${\lambda _{1}}$ | 1 | 0.12940 | 0.12812 | 0.05424 | 0.01879 | – | – | ||
${\lambda _{2}}$ | 2 | 0.39724 | 0.15151 | 0.24138 | 0.04833 | – | – | ||
θ | −0.5 | 0.31467 | 0.14070 | 0.29415 | 0.06674 | 0.29949 | 0.09693 | ||
${\sigma _{2}^{2}}$ | 9 | 27.87327 | 1.15731 | 15.12863 | −0.14888 | 21.68229 | 0.72326 | ||
$N=500$ | ${\alpha _{1}}$ | 0.6 | 0.00156 | −0.00695 | 0.00076 | −0.00153 | – | – | |
${\alpha _{2}}$ | 0.4 | 0.00194 | −0.00373 | 0.00053 | 0.00016 | – | – | ||
${\lambda _{1}}$ | 1 | 0.01041 | 0.01201 | 0.00543 | 0.00290 | – | – | ||
${\lambda _{2}}$ | 2 | 0.03882 | 0.01843 | 0.02362 | −0.00057 | – | – | ||
θ | −0.5 | 0.06670 | −0.02014 | 0.04298 | −0.00268 | 0.04313 | 0.00562 | ||
${\sigma _{2}^{2}}$ | 9 | 6.24237 | −1.99232 | 1.81265 | 0.00611 | 1.85222 | −0.03506 | ||
Frank | $N=50$ | ${\alpha _{1}}$ | 0.6 | 0.02049 | −0.06064 | 0.00912 | −0.01594 | – | – |
${\alpha _{2}}$ | 0.4 | 0.01951 | −0.04936 | 0.00772 | −0.02070 | – | – | ||
${\lambda _{1}}$ | 1 | 0.13769 | 0.13467 | 0.05748 | 0.02280 | – | – | ||
${\lambda _{2}}$ | 2 | 0.40626 | 0.15408 | 0.23717 | 0.05534 | – | – | ||
θ | −1 | 1.81788 | 0.12516 | 1.75638 | −0.01239 | 1.68019 | 0.06211 | ||
${\sigma _{2}^{2}}$ | 9 | 25.10400 | 0.49423 | 14.86812 | −0.10034 | 21.92090 | 0.74026 | ||
$N=500$ | ${\alpha _{1}}$ | 0.6 | 0.00161 | −0.00702 | 0.00075 | −0.00239 | – | – | |
${\alpha _{2}}$ | 0.4 | 0.00187 | −0.00364 | 0.00050 | −0.00046 | – | – | ||
${\lambda _{1}}$ | 1 | 0.01093 | 0.01652 | 0.00562 | 0.00501 | – | – | ||
${\lambda _{2}}$ | 2 | 0.03728 | 0.01217 | 0.02335 | 0.00203 | – | – | ||
θ | −1 | 0.31942 | −0.05593 | 0.18960 | −0.01481 | 0.1902 | −0.0079 | ||
${\sigma _{2}^{2}}$ | 9 | 4.82620 | −1.75765 | 1.83082 | 0.02144 | 1.85852 | −0.02690 | ||
Clayton | $N=50$ | ${\alpha _{1}}$ | 0.6 | 0.01987 | −0.06159 | 0.00903 | −0.01671 | – | – |
${\alpha _{2}}$ | 0.4 | 0.01879 | −0.04928 | 0.00632 | −0.01644 | – | – | ||
${\lambda _{1}}$ | 1 | 0.13479 | 0.14072 | 0.06096 | 0.03052 | – | – | ||
${\lambda _{2}}$ | 2 | 0.40675 | 0.14807 | 0.23171 | 0.02871 | – | – | ||
θ | 1 | 0.78497 | 0.07464 | 0.67837 | 0.21235 | 0.57454 | 0.10972 | ||
${\sigma _{2}^{2}}$ | 9 | 24.40051 | 0.17321 | 15.29879 | −0.08379 | 23.73506 | 0.73754 | ||
$N=500$ | ${\alpha _{1}}$ | 0.6 | 0.00153 | −0.00722 | 0.00075 | −0.00197 | – | – | |
${\alpha _{2}}$ | 0.4 | 0.00196 | −0.00385 | 0.00047 | −0.00083 | – | – | ||
${\lambda _{1}}$ | 1 | 0.01036 | 0.01745 | 0.00517 | 0.00409 | – | – | ||
${\lambda _{2}}$ | 2 | 0.03999 | 0.01227 | 0.02304 | 0.00110 | – | – | ||
θ | 1 | 0.09927 | 0.04408 | 0.05557 | 0.03556 | 0.05559 | 0.02310 | ||
${\sigma _{2}^{2}}$ | 9 | 2.95995 | −0.68733 | 1.79836 | 0.01348 | 1.87740 | −0.02407 |
Table 2 demonstrates the estimation results when one innovation has a Poisson distribution and the other has a negative binomial one. With the inclusion of an additional variance parameter, the CLS estimation methods exhibit larger MSE and bias than the CML and Two-step estimation methods, for both the dependence and variance parameter estimates. Furthermore, the MSE of ${\hat{\sigma }_{2}^{2}}$ is smallest when the CML estimation method is used. On the other hand, both the Two-step and CML estimation methods produce similar estimates of θ in terms of MSE, regardless of sample size and copula function.
We can conclude that it is possible to accurately estimate the dependence parameter via CML using the CLS estimates of ${\hat{\alpha }_{j}}$ and ${\hat{\lambda }_{j}}$. The resulting $\hat{\theta }$ will be closer to the actual value of θ than ${\hat{\theta }^{\mathrm{CLS}}}$ and will not differ much from ${\hat{\theta }^{\mathrm{CML}}}$. Additional inference on the bias of the estimates can be found in Appendix A.
5 Application to loan default data
In this section we estimate a BINAR(1) model with the joint innovation distribution modelled by a copula cdf for empirical data. The data set consists of loan data which includes loans that have defaulted and loans that were repaid without missing any payments (non-defaulted loans). We will analyse and model the dependence between defaulted and non-defaulted loans as well as the presence of autocorrelation.
5.1 Loan default data
The data sample used is from Bondora, the Estonian peer-to-peer lending company. In November of 2014 Bondora introduced a loan rating system which assigns loans to different groups, based on their risk level. There are 8 groups ranging from the lowest risk group, ‘AA’, to the highest risk group, ‘HR’. However, the loan rating system could not be applied to most older loans due to a lack of data needed for Bondora’s rating model. Although Bondora issues loans in 4 different countries: Estonia, Finland, Slovakia and Spain, we will only focus on the loans issued in Spain. Since a new rating model indicates new rules for accepting or rejecting loans, we have selected the data sample from 21 October 2013, because from that date forward all loans had a rating assigned to them, to 1 January 2016. The time series are displayed in Figure 1. We are analysing data consisting of 115 weekly records.
The loan statistics are provided in Table 3:
Table 3.
Summary statistics of the weekly data of defaulted and non-defaulted loans issued in Spain
min | max | mean | variance | |
DefaultedLoans | 1.00 | 60.00 | 22.60 | 158.66 |
CompletedLoans | 0.00 | 15.00 | 5.30 | 11.67 |
The mean, minimum, maximum and variance is higher for defaulted loans than for non-defaulted loans. As can be seen from Figure 2, the numbers of defaulted and non-defaulted loans might be correlated since they both exhibit increase and decrease periods at the same times.
The correlation between the two time series is 0.6684. We also note that the mean and variance are lower in the beginning of the time series. This feature could be due to various reasons: the effect of the new loan rating system, which was officially implemented in December of 2014, the effect of advertising or the fact that the amount of loans, issued to people living outside of Estonia, increased. The analysis of the significance of these effects is left for future research.
The sample autocorrelation (AC) function and the partial autocorrelation (PAC) function are displayed in Figure 2. We can see that the AC function is decaying over time and the PAC function has a significant first lag which indicates that the non-negative integer-valued time series could be autocorrelated.
In order to analyse if the amount of defaulted loans depends on the amount of non-defaulted loans on the same week, we will consider a BINAR(1) model with different copulas for the innovations. For the marginal distributions of the innovations we will consider the Poisson distribution as well as the negative binomial one. Our focus is the estimation of the dependence parameter, and we will use the Two-step estimation method, based on the Monte Carlo simulation results presented in Section 4.
5.2 Estimated models
We estimated a number of BINAR(1) models with different distributions of innovations which include combinations of:
In the first step of the Two-step method, we estimated ${\hat{\alpha }_{1}}$ and ${\hat{\lambda }_{1}}$ for non-defaulted loans, and ${\hat{\alpha }_{2}}$ and ${\hat{\lambda }_{2}}$ for defaulted loans via CLS. The results are provided in Table 4 with standard errors for the Poisson case in parenthesis:
Table 4.
Parameter estimates for BINAR(1) model via the Two-step estimation method: parameter CLS estimates from the first step with standard errors for the Poisson marginal distribution case in parenthesis
${\hat{\alpha }_{1}}$ | ${\hat{\alpha }_{2}}$ | ${\hat{\lambda }_{1}}$ | ${\hat{\lambda }_{2}}$ |
0.53134 | 0.75581 | 2.52174 | 5.58940 |
(0.08151) | (0.06163) | (0.45012) | (1.41490) |
Because the CLS estimation of parameters ${\alpha _{j}}$ and ${\lambda _{j}}$, $j=1,2$, does not depend on the selected copula and the marginal distribution family, these parameters will remain the same for each of the different distribution combinations for innovations. We can see that defaulted loans exhibit a higher degree of autocorrelation than non-defaulted loans do, due to a larger value of ${\hat{\alpha }_{2}}$. The innovation mean parameter for defaulted loans is also higher, what indicates that random shocks have a larger effect on the number of defaulted loans.
The parameter estimation results from the second-step are provided in Table 5 with standard errors in parenthesis. ${\hat{\sigma }_{1}^{2}}$ is the innovation variance estimate of non-defaulted loans and ${\hat{\sigma }_{2}^{2}}$ is the innovation variance estimate of defaulted loans. According to [16], the observed Fisher information is the negative Hessian matrix, evaluated at the maximum likelihood estimator (MLE). The asymptotic standard errors reported in Table 5 are derived under the assumption that ${\alpha _{j}}$ and ${\lambda _{j}}$, $j=1,2$, are known, ignoring that the true values are substituted in the second step with their CLS estimates.
From the results in Table 5 we see that, according to the Akaike information criterion (AIC) and log-likelihood values, in most cases the FGM copula most accurately describes the relationship between the innovations of defaulted and non-defaulted loans, with the Frank copula being very close in terms of the AIC value. The Clayton copula is the least accurate in describing the innovation joint distribution, when compared to the FGM and Frank copula cases, which indicates that defaulted and non-defaulted loans do not exhibit strong left tail dependence.
Since the summary statistics of the data sample showed that the variance of the data is larger than the mean, a negative binomial marginal distribution may provide a better fit. Additionally, because copulas can link different marginal distributions, it is interesting to see if copulas with different discrete marginal distributions would also improve the model fit. BINAR(1) models where non-defaulted loan innovations are modelled with negative binomial distributions and defaulted loan innovations are modelled with Poisson marginal distributions, and vice versa, were estimated. In general, changing one of the marginal distributions to a negative binomial provides a better fit in terms of AIC than the Poisson marginal distribution case. However, the smallest AIC value is achieved when both marginal distributions are modelled with negative binomial distributions, linked via the FGM copula. Furthermore, the estimated innovation variance, ${\hat{\sigma }_{2}^{2}}$, is much larger for defaulted loans, and this is similar to what we observed from the defaulted loan data summary statistics.
Table 5.
Parameter estimates for BINAR(1) model via Two-step estimation method: parameter CML estimates from the second-step for different innovation marginal and joint distribution combinations with standard errors in parenthesis, derived under the assumption that the values ${\hat{\lambda }_{j}}$ and ${\hat{\alpha }_{j}}$, $j=1,2$, from the first step are true
Marginals | Copula | $\hat{\theta }$ | ${\hat{\sigma }_{1}^{2}}$ | ${\hat{\sigma }_{2}^{2}}$ | AIC | Log-likelihood |
Both Poisson | FGM | 0.89270 | – | – | 1763.48096 | −880.74048 |
(0.18671) | ||||||
Frank | 2.38484 | – | – | 1760.15692 | −879.07846 | |
(0.53367) | ||||||
Clayton | 0.39357 | – | – | 1761.12369 | −879.56185 | |
(0.11697) | ||||||
Negative binomial and Poisson | FGM | 1.00000 | 6.46907 | – | 1731.57339 | −863.78670 |
(0.22914) | (1.01114) | |||||
Frank | 2.14329 | 6.10242 | – | 1731.95241 | −863.97620 | |
(0.45100) | (1.15914) | |||||
Clayton | 0.34540 | 5.73731 | – | 1736.47641 | −866.23821 | |
(0.12859) | (0.52831) | |||||
Poisson and negative binomial | FGM | 1.00000 | – | 44.83107 | 1498.29563 | −747.14782 |
(0.26357) | (7.37423) | |||||
Frank | 2.01486 | – | 44.10555 | 1498.81039 | −747.40519 | |
(0.61734) | (7.33169) | |||||
Clayton | 0.38310 | – | 43.42739 | 1503.55388 | −749.77694 | |
(0.17376) | (7.29842) | |||||
Both negative binomial | FGM | 1.00000 | 6.55810 | 45.36834 | 1466.15418 | −730.07709 |
(0.31675) | (1.24032) | (7.55217) | ||||
Frank | 2.21356 | 6.58754 | 45.42601 | 1466.97947 | −730.48973 | |
(0.68192) | (1.26126) | (7.57743) | ||||
Clayton | 0.55939 | 6.64478 | 45.78307 | 1470.73515 | −732.36758 | |
(0.24652) | (1.25833) | (7.66324) |
Overall, both Frank and FGM copulas provide similar fit in terms of log-likelihood, regardless of the selected marginal distributions. We note, however, that for some FGM copula cases, the estimated value of parameter θ is equal to the maximal attainable value 1. Based on copula descriptions from Section 3, the FGM copula is used to model weak dependence. Given a larger sample size, the Frank copula might be more appropriate because it can capture a stronger dependence than the FGM copula can do. The negative binomial marginal distribution case $\hat{\theta }\approx 2.21356$ for the Frank copula indicates that there is a positive dependence between defaulted and non-defaulted loans, just as in the FGM copula case.
6 Conclusions
The analysis via Monte Carlo simulations of different estimation methods shows that, although the estimates of BINAR(1) parameters via CML has the smallest MSE and bias, estimates of the dependence parameter has smaller differences of MSE and bias than for other estimation methods, indicating that estimations of the dependence parameter via different methods do not exhibit large differences. While CML estimates exhibit the smallest MSE, their calculation via numerical optimization relies on the selection of the initial parameter values. These values can be selected via CLS estimation.
An empirical application of BINAR models for loan data shows that, regardless of the selected marginal distributions, the FGM copula provides the best model fit in almost all cases. Models with the Frank copula are similar to FGM copula models in terms of AIC values. For some of these cases, the estimated FGM copula dependence parameter value was equal to the maximum that can be attained by an FGM copula. In such cases, a larger sample size could help to determine whether the FGM or Frank copula is more appropriate to model the dependence between amounts of defaulted and non-defaulted loans.
Although selecting marginal distributions from different families (Poisson or negative binomial) provided better models than those with only Poisson marginal distributions, the models with both marginal distributions modelled via negative binomial distributions provide the smallest AIC values which reflects overdispersion in amounts of both defaulted and non-defaulted loans. The FGM copula, which provides the best model fit, models variables which exhibit weak dependence. Furthermore, the estimated copula dependence parameter indicates that the dependence between amounts of defaulted and non-defaulted loans is positive.
Finally, one can apply some other copulas in order to analyse whether the loan data exhibits different forms of dependence from the ones discussed in this paper. Lastly, the approach can be extended by analysing the presence of structural changes within the data, or checking the presence of seasonality as well as extending the BINAR(1) model with copula joined innovations to account for the past values of other time series rather than only itself.