A bivariate integervalued autoregressive process of order 1 (BINAR(1)) with copulajoint innovations is studied. Different parameter estimation methods are analyzed and compared via Monte Carlo simulations with emphasis on estimation of the copula dependence parameter. An empirical application on defaulted and nondefaulted loan data is carried out using different combinations of copula functions and marginal distribution functions covering the cases where both marginal distributions are from the same family, as well as the case where they are from different distribution families.
Different financial institutions that issue loans do this following companyspecific (and/or countrydefined) rules which act as a safeguard against loans issued to people who are known to be insolvent. However, striving for higher profits might motivate some companies to issue loans to higher risk clients. Usually company’s methods for evaluating loan risk are not publicly available. However, one way to evaluate if there aren’t too many knowingly very highrisk loans issued, and if insolvent clients are adequately separated from responsible clients, would be to look at the quantity of defaulted and nondefaulted loans issued each day. The adequacy of company’s rules for issuing loans can be analysed by modelling via copulas the dependence between the number of defaulted loans and the number of nondefaulted loans. The advantage of such approach is that copulas allow to model the marginal distributions (possibly from different distribution families) and their dependence structure (which is described via a copula) separately. Because of this feature, copulas were applied to many different fields, including survival analysis, hydrology, insurance risk analysis as well as finance (for examples of copula applications, see [
The dependence of the default rate of loans on different credit risk categories was analysed in [
In this paper we expand on using copulas in BINAR models by analysing additional copula families for the innovations of the BINAR(1) model and analyse different methods for BINAR(1) model parameter estimation. We also present a twostep method for the parameter estimation of the BINAR(1) model, where we estimate the model parameters separately from the dependence parameter of the copula. These estimation methods (including the one used in [
The paper is organized as follows. Section
The BINAR(1) process was introduced in [
Let
Properties of the thinning operator are provided in [
Now we present some properties of the BINAR(1) model. They will be used when analysing some of parameter estimation methods. The proofs for these properties can be easily derived and some of them are provided in [
Similarly to (
Hence, the distributional properties of the BINAR(1) process can be studied in terms of
From the covariance and correlation (see
In this section we recall the definition and main properties of bivariate copulas, mainly following [
Copulas are used for modelling the dependence between several random variables. The main advantage of using copulas is that they allow to model the marginal distributions separately from their joint distribution. In this paper we are using twodimensional copulas which are defined as follows:
A 2dimensional copula
for every
for every
for any
The theoretical foundation of copulas is given by Sklar’s theorem:
If a pair of random variables
Since innovations of a BINAR(1) model are nonnegative integervalued random variables, one needs to consider copulas linking discrete distributions. In this section we will mention some of the key differences when copula marginals are discrete rather than continuous.
Firstly, as mentioned in Theorem
In this section we will present several bivariate copulas, which will be used later when constructing and evaluating the BINAR(1) model. For all the copulas discussed, the following notation is used:
The Farlie–Gumbel–Morgenstern (FGM) copula has the following form:
The Frank copula has the following form:
The Clayton copula has the following form:
In this section we examine different BINAR(1) model parameter estimation methods and provide a twostep method for separate estimation of the copula dependence parameter. Estimation methods are compared via Monte Carlo simulations. Let
The Conditional least squares (CLS) estimator minimizes the squared distance between
Using Theorem
For the Poisson marginal distribution case the asymptotic variance matrix can be expressed as (see [
Note that
Assume that the joint pmf of
Assume that the joint pmf of
Assume now that the Poisson innovations
Our estimation method is based on the approximation of covariance
For the FGM copula, if we take the derivative of the sum
Depending on the selected copula family, calculation of (
BINAR(1) models can be estimated via conditional maximum likelihood (CML) (see [
In the case of copulabased BINAR(1) model with Poisson marginals,
As for the CLS estimator, in other cases, where the marginal distribution has parameters other than
Depending on the range of attainable values of the parameters and the sample size, CML maximization might take some time to compute. On the other hand, since CLS estimators of
Summarizing, the twostep approach to estimating unknown parameters is to find
We carried out a Monte Carlo simulation 1000 times to test the estimation methods with sample size 50 and 500. The generated model was a BINAR(1) with innovations joined by either the FGM, Frank or Clayton copula with Poisson marginal distributions, as well as with marginal distributions from different families: one is a Poisson distribution and the other is a negative binomial one. Note that for the twostep method only the estimates of
Monte Carlo simulation results for a BINAR(1) model with Poisson innovations linked by the FGM, Frank or Clayton copula
Copula  Sample size  Parameter  True value  CLS  CML  TwoStep  
MSE  Bias  MSE  Bias  MSE  Bias  
FGM  0.6  0.01874  −0.05823  0.00887  −0.01789  –  –  
0.4  0.02033  −0.05223  0.01639  −0.02751  –  –  
1  0.12983  0.13325  0.06514  0.03366  –  –  
2  0.25625  0.16029  0.19939  0.07597  –  –  
−0.5  0.12568  0.33840  0.07568  0.3311  0.0876  
0.6  0.00147  −0.00432  0.00073  −0.00122  –  –  
0.4  0.00184  −0.00505  0.00129  −0.00157  –  –  
1  0.01012  0.00968  0.00556  0.00215  –  –  
2  0.02413  0.01843  0.01763  0.00678  –  –  
−0.5  0.04679  0.00668  0.04271  −0.00700  −0.00443  
Frank  0.6  0.02023  −0.06039  0.00950  −0.01965  –  –  
0.4  0.02005  −0.05251  0.01630  −0.02858  –  –  
1  0.13562  0.13536  0.06740  0.03625  –  –  
2  0.25687  0.16392  0.19975  0.08291  –  –  
−1  0.12394  2.05786  0.00860  1.97515  0.04216  
0.6  0.00153  −0.00595  0.00075  −0.00249  –  –  
0.4  0.00181  −0.00582  0.00129  −0.00132  –  –  
1  0.01033  0.01269  0.00550  0.00421  –  –  
2  0.02442  0.02129  0.01785  0.00629  –  –  
−1  0.22084  0.01746  0.20138  −0.01779  −0.01342  
Clayton  0.6  0.01826  −0.05489  0.00799  −0.013295  –  –  
0.4  0.01976  −0.05057  0.01585  −0.02427  –  –  
1  0.12679  0.12104  0.06080  0.01743  –  –  
2  0.25725  0.15704  0.19934  0.06499  –  –  
1  0.71845  0.02621  0.72581  0.22628  0.13283  
0.6  0.00146  −0.00518  0.00070  0.00016  –  –  
0.4  0.00189  −0.00350  0.00120  −0.00049  –  –  
1  0.00973  0.01137  0.00513  −0.00150  –  –  
2  0.02447  0.01113  0.01707  0.00065  –  –  
1  0.11578  0.03556  0.05864  0.04250  −0.01342 
The results for the Poisson marginal distribution case are provided in Table
As can be seen in Table
Monte Carlo simulation results for a BINAR(1) model with one innovation following a Poisson distribution and the other – a negative binomial one, where both innovations are linked by the FGM, Frank or Clayton copula
Copula  Sample size  Parameter  True value  CLS  CML  TwoStep  
MSE  Bias  MSE  Bias  MSE  Bias  
FGM  0.6  0.01895  −0.05858  0.00845  −0.01513  –  –  
0.4  0.01936  −0.04902  0.00767  −0.01953  –  –  
1  0.12940  0.12812  0.05424  0.01879  –  –  
2  0.39724  0.15151  0.24138  0.04833  –  –  
−0.5  0.31467  0.14070  0.06674  0.29949  0.09693  
9  27.87327  1.15731  15.12863  −0.14888  21.68229  0.72326  
0.6  0.00156  −0.00695  0.00076  −0.00153  –  –  
0.4  0.00194  −0.00373  0.00053  0.00016  –  –  
1  0.01041  0.01201  0.00543  0.00290  –  –  
2  0.03882  0.01843  0.02362  −0.00057  –  –  
−0.5  0.06670  −0.02014  −0.00268  0.04313  0.00562  
9  6.24237  −1.99232  1.81265  0.00611  1.85222  −0.03506  
Frank  0.6  0.02049  −0.06064  0.00912  −0.01594  –  –  
0.4  0.01951  −0.04936  0.00772  −0.02070  –  –  
1  0.13769  0.13467  0.05748  0.02280  –  –  
2  0.40626  0.15408  0.23717  0.05534  –  –  
−1  1.81788  0.12516  1.75638  −0.01239  0.06211  
9  25.10400  0.49423  14.86812  −0.10034  21.92090  0.74026  
0.6  0.00161  −0.00702  0.00075  −0.00239  –  –  
0.4  0.00187  −0.00364  0.00050  −0.00046  –  –  
1  0.01093  0.01652  0.00562  0.00501  –  –  
2  0.03728  0.01217  0.02335  0.00203  –  –  
−1  0.31942  −0.05593  −0.01481  0.1902  −0.0079  
9  4.82620  −1.75765  1.83082  0.02144  1.85852  −0.02690  
Clayton  0.6  0.01987  −0.06159  0.00903  −0.01671  –  –  
0.4  0.01879  −0.04928  0.00632  −0.01644  –  –  
1  0.13479  0.14072  0.06096  0.03052  –  –  
2  0.40675  0.14807  0.23171  0.02871  –  –  
1  0.78497  0.07464  0.67837  0.21235  0.10972  
9  24.40051  0.17321  15.29879  −0.08379  23.73506  0.73754  
0.6  0.00153  −0.00722  0.00075  −0.00197  –  –  
0.4  0.00196  −0.00385  0.00047  −0.00083  –  –  
1  0.01036  0.01745  0.00517  0.00409  –  –  
2  0.03999  0.01227  0.02304  0.00110  –  –  
1  0.09927  0.04408  0.03556  0.05559  0.02310  
9  2.95995  −0.68733  1.79836  0.01348  1.87740  −0.02407 
Table
We can conclude that it is possible to accurately estimate the dependence parameter via CML using the CLS estimates of
In this section we estimate a BINAR(1) model with the joint innovation distribution modelled by a copula cdf for empirical data. The data set consists of loan data which includes loans that have defaulted and loans that were repaid without missing any payments (nondefaulted loans). We will analyse and model the dependence between defaulted and nondefaulted loans as well as the presence of autocorrelation.
The data sample used is from Bondora, the Estonian peertopeer lending company. In November of 2014 Bondora introduced a loan rating system which assigns loans to different groups, based on their risk level. There are 8 groups ranging from the lowest risk group, ‘AA’, to the highest risk group, ‘HR’. However, the loan rating system could not be applied to most older loans due to a lack of data needed for Bondora’s rating model. Although Bondora issues loans in 4 different countries: Estonia, Finland, Slovakia and Spain, we will only focus on the loans issued in Spain. Since a new rating model indicates new rules for accepting or rejecting loans, we have selected the data sample from 21 October 2013, because from that date forward all loans had a rating assigned to them, to 1 January 2016. The time series are displayed in Figure
‘CompletedLoans’ – the amount of nondefaulted loans issued per week which are repaid and have never defaulted (a loan that is 60 or more days overdue is considered defaulted);
‘DefaultedLoans’ – the amount of defaulted loans issued per week.
Summary statistics of the weekly data of defaulted and nondefaulted loans issued in Spain
min  max  mean  variance  
DefaultedLoans  1.00  60.00  22.60  158.66 
CompletedLoans  0.00  15.00  5.30  11.67 
Bondora loan data: nondefaulted and defaulted loans by their issue date
AC function and PAC function plots of Bondora loan data
The mean, minimum, maximum and variance is higher for defaulted loans than for nondefaulted loans. As can be seen from Figure
The correlation between the two time series is 0.6684. We also note that the mean and variance are lower in the beginning of the time series. This feature could be due to various reasons: the effect of the new loan rating system, which was officially implemented in December of 2014, the effect of advertising or the fact that the amount of loans, issued to people living outside of Estonia, increased. The analysis of the significance of these effects is left for future research.
The sample autocorrelation (AC) function and the partial autocorrelation (PAC) function are displayed in Figure
In order to analyse if the amount of defaulted loans depends on the amount of nondefaulted loans on the same week, we will consider a BINAR(1) model with different copulas for the innovations. For the marginal distributions of the innovations we will consider the Poisson distribution as well as the negative binomial one. Our focus is the estimation of the dependence parameter, and we will use the Twostep estimation method, based on the Monte Carlo simulation results presented in Section
We estimated a number of BINAR(1) models with different distributions of innovations which include combinations of:
different copula functions: FGM, Frank or Clayton;
different combinations of the Poisson and negative binomial distributions: both marginals are Poisson, both marginals are negative binomial, or a mix of both.
In the first step of the Twostep method, we estimated
Parameter estimates for BINAR(1) model via the Twostep estimation method: parameter CLS estimates from the first step with standard errors for the Poisson marginal distribution case in parenthesis
0.53134  0.75581  2.52174  5.58940 
(0.08151)  (0.06163)  (0.45012)  (1.41490) 
Because the CLS estimation of parameters
The parameter estimation results from the secondstep are provided in Table
From the results in Table
Since the summary statistics of the data sample showed that the variance of the data is larger than the mean, a negative binomial marginal distribution may provide a better fit. Additionally, because copulas can link different marginal distributions, it is interesting to see if copulas with different discrete marginal distributions would also improve the model fit. BINAR(1) models where nondefaulted loan innovations are modelled with negative binomial distributions and defaulted loan innovations are modelled with Poisson marginal distributions, and vice versa, were estimated. In general, changing one of the marginal distributions to a negative binomial provides a better fit in terms of AIC than the Poisson marginal distribution case. However, the smallest AIC value is achieved when both marginal distributions are modelled with negative binomial distributions, linked via the FGM copula. Furthermore, the estimated innovation variance,
Parameter estimates for BINAR(1) model via Twostep estimation method: parameter CML estimates from the secondstep for different innovation marginal and joint distribution combinations with standard errors in parenthesis, derived under the assumption that the values
Marginals  Copula  AIC  Loglikelihood  
Both Poisson  FGM  0.89270  –  –  1763.48096  −880.74048 
(0.18671)  
Frank  2.38484  –  –  1760.15692  −879.07846  
(0.53367)  
Clayton  0.39357  –  –  1761.12369  −879.56185  
(0.11697)  
Negative binomial and Poisson  FGM  1.00000  6.46907  –  1731.57339  −863.78670 
(0.22914)  (1.01114)  
Frank  2.14329  6.10242  –  1731.95241  −863.97620  
(0.45100)  (1.15914)  
Clayton  0.34540  5.73731  –  1736.47641  −866.23821  
(0.12859)  (0.52831)  
Poisson and negative binomial  FGM  1.00000  –  44.83107  1498.29563  −747.14782 
(0.26357)  (7.37423)  
Frank  2.01486  –  44.10555  1498.81039  −747.40519  
(0.61734)  (7.33169)  
Clayton  0.38310  –  43.42739  1503.55388  −749.77694  
(0.17376)  (7.29842)  
Both negative binomial  FGM  1.00000  6.55810  45.36834  − 

(0.31675)  (1.24032)  (7.55217)  
Frank  2.21356  6.58754  45.42601  1466.97947  −730.48973  
(0.68192)  (1.26126)  (7.57743)  
Clayton  0.55939  6.64478  45.78307  1470.73515  −732.36758  
(0.24652)  (1.25833)  (7.66324) 
Overall, both Frank and FGM copulas provide similar fit in terms of loglikelihood, regardless of the selected marginal distributions. We note, however, that for some FGM copula cases, the estimated value of parameter
The analysis via Monte Carlo simulations of different estimation methods shows that, although the estimates of BINAR(1) parameters via CML has the smallest MSE and bias, estimates of the dependence parameter has smaller differences of MSE and bias than for other estimation methods, indicating that estimations of the dependence parameter via different methods do not exhibit large differences. While CML estimates exhibit the smallest MSE, their calculation via numerical optimization relies on the selection of the initial parameter values. These values can be selected via CLS estimation.
An empirical application of BINAR models for loan data shows that, regardless of the selected marginal distributions, the FGM copula provides the best model fit in almost all cases. Models with the Frank copula are similar to FGM copula models in terms of AIC values. For some of these cases, the estimated FGM copula dependence parameter value was equal to the maximum that can be attained by an FGM copula. In such cases, a larger sample size could help to determine whether the FGM or Frank copula is more appropriate to model the dependence between amounts of defaulted and nondefaulted loans.
Although selecting marginal distributions from different families (Poisson or negative binomial) provided better models than those with only Poisson marginal distributions, the models with both marginal distributions modelled via negative binomial distributions provide the smallest AIC values which reflects overdispersion in amounts of both defaulted and nondefaulted loans. The FGM copula, which provides the best model fit, models variables which exhibit weak dependence. Furthermore, the estimated copula dependence parameter indicates that the dependence between amounts of defaulted and nondefaulted loans is positive.
Finally, one can apply some other copulas in order to analyse whether the loan data exhibits different forms of dependence from the ones discussed in this paper. Lastly, the approach can be extended by analysing the presence of structural changes within the data, or checking the presence of seasonality as well as extending the BINAR(1) model with copula joined innovations to account for the past values of other time series rather than only itself.
Standard errors of the bias of the estimated parameters from the Monte Carlo simulation
Copula  Sample size  Parameter  True value  CLS  CML  TwoStep  
PP  PNB  PP  PNB  PP  PNB  
FGM  0.6  0.12396  0.12465  0.09252  0.09073  –  –  
0.4  0.13274  0.13029  0.12510  0.08541  –  –  
1  0.33494  0.33631  0.25311  0.23225  –  –  
2  0.48040  0.61210  0.44024  0.48916  –  –  
−0.5  0.53139  0.54330  0.57707  0.53850  0.56899  0.53887  
9  –  5.15368  –  3.88865  –  4.60221  
0.6  0.03813  0.03893  0.02706  0.02745  –  –  
0.4  0.04258  0.04392  0.03585  0.02306  –  –  
1  0.10018  0.10076  0.07455  0.07367  –  –  
2  0.15433  0.19676  0.13266  0.15377  –  –  
−0.5  0.21631  0.25760  0.20666  0.20741  0.20657  0.20770  
9  –  1.50841  –  1.34701  –  1.36119  
Frank  0.6  0.12882  0.12975  0.09552  0.09420  –  –  
0.4  0.13158  0.13073  0.12448  0.08543  –  –  
1  0.34266  0.34594  0.25719  0.23879  –  –  
2  0.47982  0.61879  0.43939  0.48409  –  –  
−1  1.34944  1.34314  1.43522  1.32589  1.40547  1.29538  
9  –  4.98845  –  3.85654  –  4.62540  
0.6  0.03862  0.03951  0.02734  0.02727  –  –  
0.4  0.04212  0.04312  0.03591  0.02240  –  –  
1  0.10091  0.10329  0.07409  0.07481  –  –  
2  0.15490  0.19278  0.13351  0.15287  –  –  
−1  0.46985  0.56268  0.44862  0.43540  0.44802  0.43627  
9  –  1.31856  –  1.35359  –  1.36369  
Clayton  0.6  0.12352  0.12684  0.08846  0.09360  –  –  
0.4  0.13123  0.12798  0.12361  0.07779  –  –  
1  0.33505  0.33926  0.24609  0.24514  –  –  
2  0.48252  0.62066  0.44194  0.48075  –  –  
1  0.84763  0.88328  0.82176  0.79618  0.77890  0.75037  
9  –  4.93912  –  3.91243  –  4.81812  
0.6  0.03782  0.03850  0.02641  0.02742  –  –  
0.4  0.04337  0.04410  0.03468  0.02176  –  –  
1  0.09804  0.10033  0.07162  0.07180  –  –  
2  0.15612  0.19969  0.13071  0.15185  –  –  
1  0.33857  0.31212  0.23852  0.23316  0.23717  0.23476  
9  –  1.57798  –  1.34163  –  1.37066 
Let our Monte Carlo simulation data be
The mean squared error and the bias are calculated as follows:
Kernel density estimate for the bias of the dependence parameter estimates in the Monte Carlo simulation
The standard errors of the parameter bias of the Monte Carlo simulation are presented in Table
The results in Table
The authors would like to thank the anonymous referee for his/her feedback and constructive insights, which helped to improve this paper.