The paper presents a characterization of equilibrium in a game-theoretic description of discounting conditional stochastic linear-quadratic (LQ for short) optimal control problem, in which the controlled state process evolves according to a multidimensional linear stochastic differential equation, when the noise is driven by a Poisson process and an independent Brownian motion under the effect of a Markovian regime-switching. The running and the terminal costs in the objective functional are explicitly dependent on several quadratic terms of the conditional expectation of the state process as well as on a nonexponential discount function, which create the time-inconsistency of the considered model. Open-loop Nash equilibrium controls are described through some necessary and sufficient equilibrium conditions. A state feedback equilibrium strategy is achieved via certain differential-difference system of ODEs. As an application, we study an investment–consumption and equilibrium reinsurance/new business strategies for mean-variance utility for insurers when the risk aversion is a function of current wealth level. The financial market consists of one riskless asset and one risky asset whose price process is modeled by geometric Lévy processes and the surplus of the insurers is assumed to follow a jump-diffusion model, where the values of parameters change according to continuous-time Markov chain. A numerical example is provided to demonstrate the efficacy of theoretical results.
Stochastic maximum principletime-inconsistencyLQ control problemequilibrium controlvariational inequality93E2060H3093E9960H10Introduction
For usual optimal control problems, by the dynamic principle of optimality [40] one may check that an optimal control remains optimal when it is restricted to a later time interval, meaning that optimal controls are time-consistent. The time-consistency feature provides a powerful advance to deal with optimal control problems. The dynamic principle of optimality consists in establishing relationships among a family of time-consistent optimal control problems parameterized by initial pairs (of time and state) through the so-called Hamilton–Jacobi–Bellman equation (HJB), which is a nonlinear partial differential equation. If the HJB equation is solvable, then one can find an optimal feedback control by taking the optimizer of the general Hamiltonian involved in the HJB equation.
However, in reality, the time-consistency can be lost in various ways, meaning that, as time goes, an optimal control might not remain optimal. Among several possible reasons causing the time-inconsistency, there are three ones playing some important roles:
the appearance of conditional expectations for the state data in the objective functional [3],
the presence of a state-dependent risk aversion in the objective functional [4],
the nonexponential discounting situation [16].
The portfolio optimization problem with a hyperbolic discount function [11] and the risk aversion attitude in mean-variance models [17, 43] and [44] are two well-known cases of time-inconsistency in mathematical finance. Motivated by the second example, the present paper studies a general linear-quadratic optimal control problem for jump diffusions, which is time-inconsistent in the sense that it does not satisfy the Bellman optimality principle due to the existence of some quadratic terms in the expected controlled state process as well as a state-dependent risk aversion term in the running and the terminal cost functionals. The fundamental challenge when dealing with a time-inconsistent optimal control models is that we can’t employ the dynamic programming approach and the standard HJB equation, in general. One way to get around the time-inconsistency issue is to consider only precommitted strategies, see, e.g., [45] and [26].
However, the main method of dealing with time-inconsistency is to consider the time-inconsistent problems as noncooperative games, in which decisions at every moment of time are taken by multiple players at each moment of time and are intended to maximize or minimize their own objective functions. As a result, Nash equilibriums are considered rather than optimal solutions, see, e.g., [3, 8, 11, 15, 16, 23, 24, 28, 37, 38] and [39]. Strotz [28] was the first who applied this game perspective for dealing with the dynamic time-inconsistent decision problem posed by the deterministic Ramsay problem. He then proposed a rudimentary notion of Nash equilibrium strategy by capturing the concept of noncommitment and allowing the commitment period to be infinitesimally small. Further references which extend [28] are [16, 24] and [13]. Ekeland and Pirvu [11] gave a formal definition of feedback Nash equilibrium controls in a continuous-time setting in order to investigate the optimal investment–consumption problem under general discount functions in both deterministic and stochastic frameworks. Björk & Murguci [3] and Ekeland et al. [10] are two further expansions of Ekeland and Pirvu’s work. Yong [39] proposed an alternative method for analyzing general discounting time-inconsistent optimal control problem in continuous-time setting by taking into account a discrete time counterpart. Zhao et al. [42] investigated the consumption–investment problem under a general discount function and a logarithmic utility function using Yong’s method. Wang and Wu investigated a partially observed time-inconsistent recursive optimization issue in [33]. Basak and Chabakauri [1] touched upon the continuous-time Markowitz’s mean-variance portfolio selection problem, while Björk et al. [4] addressed the mean-variance portfolio selection with state-dependent risk aversion. Hu et al. [15], followed by Czichowski [8], found a time-consistent strategy for mean-variance portfolio selection in a non-Markovian framework.
The linear-quadratic optimal control problems are well known as a fundamental category of optimal control problems, since they may cover a wide range of problems in applications, such as the mean-variance portfolio selection model in financial applications. Furthermore, the LQ model may be used to approximate many nonlinear control problems. In recent years, time-inconsistent LQ control problems have gotten a lot of attention. Yong worked on a general discounted time-inconsistent deterministic LQ model in [37] and he consider a forward ordinary differential equation coupled with a backward Riccati–Volterra integral equation to obtain closed-loop equilibrium strategies. Hu et al. [15] presented a specific definition of open-loop Nash equilibrium controls in a continuous-time setting, which is distinct from that for the feedback controls provided in [11], in order to analyze a time-inconsistent stochastic linear-quadratic optimal control problem with stochastic coefficients. Yong [39] studied a time-inconsistent stochastic LQ problem for mean-field type stochastic differential equation. Finally, Hu et al. [14] looked into the uniqueness of the equilibrium solution found in [15]. They are the first who give a positive result regarding the uniqueness of the solution to a time-inconsistent problem.
There is little work in the literature concerning equilibrium strategies for optimal investment and reinsurance problems under the mean-variance criterion. Zeng and Li [43] are the first who study Nash equilibrium strategies for mean-variance insurers with constant risk aversion, where the surplus process of insurers is described by the diffusion model and the price processes of the risky stocks are driven by geometric Brownian motions. They have obtained equilibrium reinsurance and investment strategies explicitly using the technique described in [3]. Li and Li [17] obtained equilibrium strategies in the case of state-dependent risk aversion through a set of well-posed integral equations. Zeng et al. [44] investigate time-consistent investment and reinsurance strategies for mean-variance insurers under constant risk aversion, in which the surplus process and the price process of the risky stock are both jump-diffusion processes.
Markov regime-switching models have recently gotten a lot of interest in financial applications; see, for example, [46, 5, 6, 34] and [18]. Markov regime-switching models permit the market to face financial crises at any moment. The market is supposed to be governed by some kind of regime at any given moment. A bull market, in which stock prices are generally increasing, is a standard illustration of such a regime. The market’s behavior radically alters after a financial crisis. A switch in the regime symbolizes the crisis. The problem of mean-variance optimization under a continuous-time Markov regime-switching financial market was first studied by Zhou and Yin [46]. By applying stochastic linear-quadratic control methods, they obtained mean-variance efficient portfolios and efficient frontiers via solving two systems of ordinary linear differential equations. In the context of continuous and multiperiod time models, Chen et al. [5] and Chen and Yang [6] studied the mean-variance asset-liability management problem, respectively. Mean-variance asset-liability management problems with a continuous-time Markov regime-switching setup have been studied by Wei et al. [34]. They explicitly deduced a time-consistent investment strategy using the method described in [3]. Liang and Song [18] investigated optimal investment and reinsurance problems for insurers with mean-variance utility under partial information, where the stock’s drift rate and the risk aversion of the insurer are both Markov-modulated.
In this work, we present a general time-inconsistent stochastic conditional LQ control problem. Differently from most current studies [15, 39, 2, 42], where the noise is driven by a Brownian motion, in our LQ system the state develops according to a SDE, in which the noise is driven by a multidimensional Brownian motion and an independent multidimensional Poisson point process under a Markov regime-switching setup. Cases of continuous-time mean-variance criteria with state-dependent risk aversion are included in the objective function. We establish a stochastic system that describes open-loop Nash equilibrium controls, using the variational technique proposed by Hu et al. [14]. We emphasize that our model generalizes the ones investigated by Zeng and Li [43], Li et al. [17], Sun and Guo [30] and Zeng et al. [44], in addition to some classes of time-inconsistent stochastic LQ optimal control problems introduced in [15].
The paper is organized as follows: in the second section, we formulate the problem and provide essential notations and preliminaries. Section 3 is dedicated to presenting the necessary and sufficient conditions for equilibrium, which is our main result, and we get the unique equilibrium control in state feedback representation through a specific category of ordinary differential equations. In the last section, we apply the results of Section 3 to find the unique equilibrium reinsurance, investment and consumption strategies for the mean-variance-utility portfolio problem, as well as discuss some special cases. The paper concludes with an Appendix that includes some proofs.
Problem setting
Let (Ω,F,F,P) be a filtered probability space where F:=Ftt∈[0,T] is a right-continuous, P-completed filtration to which all of the processes outlined below are adapted, such as the Markov chain, the Brownian motions, and the Poisson random measures.
During the present paper, we assume that the Markov chain α· takes values in finite state space χ=e1,e2,…,ed where d∈N, ei∈Rd and the j-th component of ei is the Kronecker delta δij for each i,j∈1,…,d2. H:=λij1≤i,j≤d represents the rate matrix of the Markov chain under P. Note that λij is the constant transition intensity of the chain from state ei to state ej at time t, for each i,j∈1,…,d2. As a result, for i≠j, we have λij≥0 and ∑j=1dλij=0, thus λii≤0. In the sequel, for each i,j=1,2,…,d with i≠j, we assume that λij>00$]]> consequently, λii<0. We have the following semimartingale representation of the Markov chain α· obtained from Elliott et al. [12]
αt=α0+∫0tH⊤α(τ)dτ+M(t),
where {M(t)|t∈[0,T]} is an Rd-valued (F,P)-martingale.
First, we provide a set of Markov jump martingales linked with the chain α·, which will be used to model the controlled state process. For each i,j∈1,…,d2, with i≠j, and t∈0,T, denote by Jijt:=λij∫0tατ−,eidτ+mij(t) the number of jumps from state ei to state ej up to time t, where mij(t):=∫0tατ−,eidMτ,ejdτ is an (F,P)-martingale. Φj(t) denotes the number of jumps into state ej up to time t, for each fixed j=1,2,…,d, then
Φj(t)=∑i=1,i≠jdJijt,=∑i=1,i≠jdλij∫0tατ−,eidτ+Φ˜j(t),
where Φ˜j(t):=∑i=1,i≠jdmij(t) is an F,P-martingale for each j=1,2,…,d. For each j=1,2,…,d set
λj(t)=∑i=1,i≠jdλij∫0tατ,eidτ.
Note that the process Φ˜j(t)=Φj(t)−λj(t) is an F,P-martingale, for each j=1,2,…,d.
Now, we present the Markov regime-switching Poisson random measures. Assume that Ni(dt,dz), i=1,2,…,l, are independent Poisson random measures on 0,T×R0,B0,T⊗B0 under P. Assume that the compensator for the Poisson random measure Ni(dt,dz) is defined by
nαi(dt,dz):=θαt−i(dz)dt=αt−,θi(dz)dt,
where θi(dz):=θe1i(dz),θe2i(dz),…,θedi(dz)⊤∈Rd. The subscript α in nαi, for i=1,2,…,l, represents the dependence of the probability law of the Poisson random measure on the Markov chain α·. In fact θeji(dz) is the conditional Lévy density of jump sizes of the random measure Ni(dt,dz) at time t when αt−=ej, for each j=1,2,…,d. Furthermore, the compensated Poisson random measure N˜α(dt,dz) is given by
N˜α(dt,dz)=N1(dt,dz)−nα1(dt,dz),…,Nl(dt,dz)−nαl(dt,dz)⊤.
Notations
Throughout this paper, we use the following notations: Sn is the set of n×n symmetric real matrices. C⊤ is the transpose of the vector (or matrix) C. ·,· is the inner product in some Euclidean space. For any Euclidean space H=Rn, or Sn with Frobenius norm ·, and p,l,d∈N we denote for any t∈0,T:
LpΩ,Ft,P;H=ξ:Ω→H|ξisFt-measurable,s.t.Eξp<∞, for any p≥1;
Throughout this paper, we consider a multidimensional nonhomogeneous linear controlled jump-diffusion system starting from the situation t,ξ,ei∈0,T×L2(Ω,Ftα,P;Rn)×χ, defined by
dXs=As,αsXs+Bs,αsus+bs,αsdsdXs=+∑i=1pCis,αsXs+Dis,αsus+σis,αsdWisdXs=+∑k=1l∫R∗Eks,z,αsXs−+Fks,z,αsusdXs=+∑k=1l∫R∗+cks,z,αsN˜αkds,dz,s∈t,T,Xt=ξ,αt=ei.
The coefficients A·,·, Ci·,·:0,T×χ→Rn×n; B·,·,Di·,·:0,T×χ→Rn×m; b·,·, σi·,·:0,T×χ→Rn; Ek·,·,·:0,T×R∗×χ→Rn×n; Fk·,·,·:0,T×R∗×χ→Rn×m; ck·,·,·:0,T×R∗×χ→Rn are deterministic matrix-valued functions. Here, for any t∈0,T, the class of admissible control processes over t,T is restricted to LF,p2t,T;Rm. For any u·∈LF,p2t,T;Rm we denote by X·=Xt,ξ,ei·;u· its solution. Different controls u· will lead to different solutions X·.
In practice, the observable switching process is followed to represent the interest rate processes over various market settings. For example, the market may be generally split into “bullish” and “bearish” states, with characteristics varying greatly between the two modes. The application of switching model in mathematical finance can be discovered, for example, in [5, 6] and references therein.
To measure the performance of u·∈LF,p2t,T;Rm, we introduce the following cost functional
Jt,ξ,ei;u·=E∫tT12QsXs,Xs+Q¯sEXsFsα,EXsFsα+Rt,sus,usds+μ1ξ+μ2,XT+12GXT,XT+12G¯EXTFTα,EXTFTα.
Due to the general influence of the modulating switching process α(·), the conditional expectation is employed rather than the expectation in (2.2). The presence of α(·) in all coefficients of the state equation (2.1) can be makes the objective functional depends on the process’s history. This type of cost functional is also motivated by practical problems such as conditional mean-variance portfolio selection problem which is considered in Section 4 of this paper. A reader interested in this type of problems is referred to [21] and [19]. The term μ1ξ+μ2,XT stems from a state-dependent utility function in economics [4].
We need to impose the following assumptions on the coefficients.
The functions A·,·, B·,·, b·,·, Ci·,·, Di·,·, σi·,·, Ek·,·,·, Fk·,·,· and ck·,·,· are deterministic, continuous and uniformly bounded. The coefficients of the cost functional satisfy
Q·,Q¯·∈C0,T;Sn,R·,·∈CD0,T;Sm,G,G¯∈Sn,μ1∈Rn×n,μ2∈Rn.
The functions R·,·, Q· and G satisfy Rt,t≥0, Qt≥0, ∀t∈0,T and G≥0.
Based on [25] we can prove under (H1) that, for any t,ξ,ei,u·∈0,T×L2Ω,Ftα,P;Rn×X×LF,p2t,T;Rm, the state equation (2.2) has a unique solution X·∈SF2t,T;Rn. Moreover, we have the estimate
Esupt≤s≤TXs2≤K1+Eξ2,
for some positive constant K. In particular for t=0 and u·∈LF,p20,T;Rm, Equation (2.1) starting from initial data 0,x0 has a unique solution X·∈SF2(0,T;Rn) for which
Esup0≤s≤TXs2≤K1+x02.
Our optimal control problem can be formulated as follows.
For any initial triplet,ξ,ei∈0,T×L2Ω,Ftα,P;Rn×χ, find a controluˆ·∈LF,p2t,T;Rmsuch thatJt,ξ,ei;uˆ·=minu.∈LF,p2t,T;RmJt,ξ,ei;u·.
Any uˆ·∈LF,p2t,T;Rm satisfying the above is called a pre-commitment optimal control. Furthermore, the presence of some quadratic terms of the conditional expectation of the state process as well as a state-dependent term in the objective functional destroys the time-consistency of a pre-committed optimal solutions of Problem (N). Hence, Problem (N) is time-inconsistent and there are two different sources of time-inconsistency.
The main results: characterization and uniqueness of equilibrium
In view of the fact that Problem (N) is time-inconsistent, the aim of this paper is to characterize open-loop Nash equilibriums as an alternative of optimal strategies. We employ the game theoretic approach to handle the time-inconsistency in the identical viewpoint as Ekeland et al. [11] and Björk and Murgoci [3]. Let us briefly explain the game perspective that we will consider as follows.
We consider a game with one player at every point t in the interval [0,T). This player corresponds to the incarnation of the controller on instant t and is referred to as “player t”.
The t-th player can control the scheme just at time t by taking his/her policy ut,·:Ω→Rm.
A control process u(·) is then viewed as a complete explanation of the selected strategies of all players in the game.
The reward to the player t is specified by the functional Jt,ξ,ei;u·.
We explain the concept of a “Nash equilibrium strategy” for the game described as above: This is an admissible control process uˆ· fulfilling the following criteria. Assume that every player s, with s>tt$]]>, will apply the strategy uˆs. Then the optimal decision for player t is that he/she also uses the strategy uˆt. However, the difficulty with this “definition” is that the individual player t does not have any effect on the game’s result. He/she just selects the control at one point t. Furthermore, because this is a time set of Lebesgue measure zero, the control dynamics will be unaffected.
As a result, to identify open-loop Nash equilibrium controls, we follow [15], where a formal definition (Definition 4 below), inspired by [11], is proposed.
In the rest of the paper, for brevity, we suppress the arguments s,αs for the coefficients As,αs, Bs,αs, bs,αs, Cis,αs, Dis,αs, σis,αs, in addition we suppress the arguments s and s,t for the coefficients Qs, Q¯s, Rs,t and we use the notation ϱz instead of ϱs,z,αs for ϱ=Ek,Fk and ck. Furthermore, sometimes we simply call uˆ· an equilibrium control instead of calling it an open-loop Nash equilibrium control, when there is no confusion.
In this section, we provide the main results about the necessary and sufficient conditions for equilibrium of the control problem formulated in the preceding section. To make the presentation of the paper more clear, the proofs will be relegated to Appendix A. To proceed towards the definition of an equilibrium, we first introduce the local spike variation for a given admissible control uˆ·∈LF,p2t,T;Rm: for any t∈0,T, v∈L2Ω,Ft−α,P;Rm and ε∈0,T−t, define
uεs=uˆs+v,fors∈t,t+ε,uˆs,fors∈t+ε,T.
We have the following definition. (Open-loop Nash equilibrium).
An admissible control uˆ·∈LF,p2(t,T;Rm) is an open-loop Nash equilibrium control for Problem (N) if for every sequence εn↓0, we have
limεn↓01εnJt,Xˆt,αt;uεn·−Jt,Xˆt,αt;uˆ·≥0,
for any t∈0,T and v∈L2Ω,Ft−α,P;Rm. The corresponding equilibrium dynamics solves the following SDE with jumps: for s∈0,T,
dXˆs=AXˆs+Buˆs+bdsdXˆs=+∑i=1pCiXˆs+Diuˆs+σidWisdXˆs=+∑k=1l∫R∗EkzXˆs−+Fkzuˆs+ckzN˜αkds,dz,Xˆ0=x0,α0=ei0.
Flow of the adjoint equations and characterization of equilibrium controls
In this subsection, we provide a general necessary and sufficient conditions to characterize the equilibrium strategies of Problem (N). First, we consider the adjoint equations used within the characterization of equilibrium controls. Let uˆ·∈LF,p2(t,T;Rm) be a fixed control and denote by Xˆ·∈SF20,T;Rn its corresponding state process. For each t∈0,T, the first order adjoint equation defined on the time interval t,T and satisfied by the 4-tuple of processes p·;t,q·;t,r·,·;t,l·;t is given as follows:
dps;t=−A⊤ps;t+∑i=1pCi⊤qis;tdps;t=+∑k=1l∫R∗Ekz⊤rks,z;tθαkdz−QXˆs−Q¯EXˆsFsαdsdps;t=+∑i=1pqis;tdWis+∑k=1l∫R∗rks,z;tN˜αkds,dzdps;t=+∑j=1dljs,tdΦ˜js,s∈t,T,pT;t=−GXˆT−G¯EXˆTFTα−μ1Xˆt−μ2.
Through this section, we will prove that we can get the equilibrium strategy by solving a system of FBSDEs which is not standard since the flow of the unknown process p·;t,q·;t,r·,·;t,l·;t for t∈[0,T] is involved. To the best of our knowledge, the ability to explicitely solve this type of equation remains an open problem, except for a certain form of the objective function. However, by the separating variables approach we are able to completely solve this problem.
Consider a deterministic matrix-valued functionϕ·,·as a solution of the following ODEdϕs,αs=ϕs,αsA⊤ds,s∈0,T,ϕT,ei=In.
For anyt∈0,Tands∈t,T, the solution of Equation (3.3) have the representationps;t=−ϕs,αs−1p¯s+G¯EXˆTFTα+μ1Xˆt+μ2−ϕs,αs−1∫sTϕτ,ατQ¯EXˆτFταdτ,andqis;t,rks,z;t,ljs;t=−ϕs,αs−1q¯is,r¯ks,z,l¯jsfori=1,2,…,p;k=1,2,…,l;j=1,2,…,d, wheredp¯s=−∑i=1pϕs,αsCi⊤ϕs,αs−1q¯isdp¯s=+∑k=1l∫R∗ϕs,αsEkz⊤ϕs,αs−1r¯ks,zθαkdzdp¯s=+ϕs,αsQXˆsds+∑i=1pq¯isdWisdp¯s=+∑k=1l∫R∗r¯ks−,zN˜αkds,dz+∑j=1dl¯jsdΦ˜js,s∈t,T,p¯T=GXˆT.
We remark that neither the coefficients nor the terminal condition of (3.4) are affected by the starting time t, so it may be considered as a standard BSDE over the entire time period [0,T], then, by the same manner of [27] we can verify that Equation (3.4) admits a unique solution.
From the representation of p·;t,q·;t,r·,·;t,l·;t, for t∈0,T given by Lemma 5, we can check that under (H1) Equation (3.3) admits a unique solution
p·;t,q·;t,r·,·;t,l·;t∈SF2t,T;Rn×L2t,T;Rnp×LF,pθ,2t,T×R∗;Rnl×LF,pλ,2t,T;Rnd.
The following second order adjoint equation is defined on the time interval t,T and satisfied by the 4-tuple of processes P·,Λ·,Γ·;·,L·:
dPs=−A⊤Ps+PsA+∑i=1pCi⊤PsCi+ΛisCi+Ci⊤ΛisdPs=+∑k=1l∫R∗Γks,zEkzθαkdz+Ekz⊤Γks,zθαkdzdPs=+∑k=1l∫R∗Ekz⊤Γks,z+PsEkzθαkdz−QdsdPs=+∑i=1pΛisdWis+∑k=1l∫R∗Γks,zN˜αkds,dzdPs=+∑j=1dLjsdΦ˜js,s∈t,T,PT=−G.
Noting that (3.5) is a standard BSDE over the entire time period [0,T], by the same manner of [27], we can verify that Equation (3.5) admits a unique solution
P·,Λ·,Γ·;·,L·∈SF2t,T;Sn×L2t,T;Snp×LF,pθ,2t,T×R∗;Snl×LF,pλ,2t,T;Snd.
Now, associated with uˆ·,Xˆ·,p·;·,q·;·,r·,·;·,P·,Γ·;· we define, for s,t∈D0,T,
Us;t=B⊤ps;t+∑i=1pDi⊤qis;t+∑k=1l∫R∗Fkz⊤rks,z;tθαkdz−Ruˆs,
and
Vs;t=∑i=1pDi⊤PsDi+∑k=1l∫R∗Fkz⊤Ps+Γs,zFkzθαkdz−R.
Definition 4 is slightly different from the original definition provided by [15] and [14], where the open-loop equilibrium control is given by
limε↓01εJt,Xˆt,αt;uε·−Jt,Xˆt,αt;uˆ·≥0.
Although the limit (3.8) already provides a characterizing condition, however, it is not very useful because it involves an a.s. limit with respect to uncountably many ε>00$]]>. Thus, in this case by using the property of RCLL of state process X(·) we can deduce an equivalent condition for the equilibrium, see Hu et al. [15]. In this paper, we defined an open-loop equilibrium control by sense (3.2), which is well defined in general.
The following lemma will be used later in this study, it provides some important property about the flow of adapted processes.
Under assumptions (H1)–(H2), for anyuˆ·∈LF,p2t,T;Rm, there exists a sequenceεntn∈N⊂(0,T−t)satisfyingεnt→0asn→∞, such thatlimn→∞1εnt∫tt+εntEUs;tds=Ut;t,dP-a.s,dt-a.e.
Now we introduce the space
L=Λ·;t∈SF2t,T;Rnsuch thatsupt∈0,TEsups∈t,TΛs;t2<+∞.
Clearly, for any uˆ·∈LF,p20,T;Rm, its associated flow of adjoint processes p·;·∈L.
The following theorem is the first main result of this work, it provides a necessary and sufficient conditions for equilibrium controls to the time-inconsistent Problem (N).
(Characterization of equilibrium).
Let (H1) hold. Given an admissible controluˆ·∈LF,p20,T;Rm, letp·;·,q·;t,r·,·;t,l·;t∈L×LF20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd,be the unique solution to the BSDE (3.3) and letP·,Λ·,Γ·,·,L·∈SF2t,T;Sn×L2t,T;Snp×LF,pθ,2t,T×R∗;Snl×LF,pλ,2t,T;Snd,be the unique solution to the BSDE (3.5). Thenuˆ·is an open-loop Nash equilibrium if and only if the following two conditions hold: The first order equilibrium conditionUt;t=0,dP-a.s.,dt-a.e.and the second order equilibrium conditionVt;t≤0,dP-a.s.,∀t∈0,T,whereUt;tandVt;tare given by (3.6) and (3.7), respectively.
In order to give a proof for the above theorem, the main idea is still based on the variational techniques in the spirit of proving the characterization of equilibria [14] and [15] in the absence of random jumps.
Let uˆ·∈LF,p20,T;Rm be an admissible control and Xˆ· be the corresponding controlled state process. Consider the perturbed control uε· defined by the spike variation (3.1) for some fixed arbitrary t∈0,T, v∈L2Ω,Ft−α,P;Rm and ε∈0,T−t. Denote by Xˆε· the solution of the state equation corresponding to uε·. It follows from the standard perturbation approach, see, for example, [31] and [41], that Xˆε·−Xˆ·=yε,v·+Yε,v·, where yε,v· and Yε,v· solve the following SDEs, respectively, for s∈t,T: dyε,vs=Ayε,vsds+∑i=1pCiyε,vs+Div1t,t+εsdWisdyε,vs=+∑k=1l∫R∗Ekzyε,vs−+Fkzv1t,t+εsN˜αkds,dz,yε,vt=0,dYε,vs=AYε,vs+Bv1t,t+εsds+∑i=1pCiYε,vsdWisdYε,vs=+∑k=1l∫R∗EkzYε,vs−N˜αkds,dz,Yε,vt=0.
We need the following lemma
Under assumption (H1), the following estimates hold:sups∈t,TEyε,vs2=Oε,sups∈t,TEYε,vs2=Oε2.We have alsosups∈t,TEyε,vsFsα2=Oε2.Moreover, we have the equalityJt,Xˆt,αt;uε·−Jt,Xˆt,αt;uˆ·=−∫tt+εEUs;t,v+12Vs;tv,vds+oε.
Now, we are ready to give the proof of Theorem 9.
Given an admissible control uˆ·∈LF,p20,T;Rm, for which (3.11) and (3.12) holds, according to Lemma 8, we have from (3.18) that for any t∈0,T and for any Rm-valued, Ftα-measurable and bounded random variable v, there exists a sequence εntn∈N⊂(0,T−t) satisfying εnt→0 as n→∞, such that
limn→01εntJt,Xˆt,αt;uε·−Jt,Xˆt,αt;uˆ·=−Ut;t,v+12Vt;tv,v,=−12Vt;tv,v,≥0,dP-a.s.
Hence uˆ· is an equilibrium strategy.
Conversely, assume that uˆ· is an equilibrium strategy. Then, by (3.2) together with (3.18) and Lemma 8, for any t,u∈0,T×Rm, the following inequality holds:
Ut;t,u+12Vt;tu,u≤0.
Now, we define ∀t,u∈0,T×Rm, Φt,u=Ut;t,u+12Vt;tu,u. Easy manipulations show that the inequality (3.19) is equivalent to Φt,0=maxu∈RmΦt,u, dP-a.s.,∀t∈0,T. So it is easy to prove that the maximum condition is equivalent to the following two conditions: Φut,0=Ut;t=0,∀t∈0,T,dP-a.s.,Φuut,0=Vt;t≤0,∀t∈0,T,dP-a.s.
This completes the proof. □
It is worth noting that for the positive semidefinite conditions on the coefficients Q·, G and R·,·, the corresponding process P(·) in [15] and [14] is indeed positive semidefinite due to the comparison principles of BSDEs. Thus, as a result of Theorem 9, a necessary and sufficient condition for a control being an equilibrium strategy is only the first order equilibrium condition (3.11). However, there is a significant difference between the estimate for the cost functional presented and that in [15] and [14]. Because stochastic coefficients and random jumps of the controlled system are taken into account, an additional term Γ(·,·) occurs in the formulation of P(·). So in this paper, P(·) is not necessarily positive semidefinite. This is why we modify the methodology of deriving the sufficient condition for equilibrium controls. Therefore, we have the following corollary, the proof of which follows the same arguments as the proof of Proposition 3.2 in [30].
Let (H1)–(H2) hold. Given an admissible controluˆ·∈LF,p2(0,T;Rm), letp·;·,q·;·,r·,·;·,l·;·∈L×LF20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rndbe the unique solution to the BSDE (3.3). Thenuˆ·is an equilibrium, if the following condition holdsdP-a.s.,dt-a.e.B⊤pt;t+∑i=1pDi⊤qit+∑k=1l∫R∗Fkz⊤rkt,zθαkdz−Ruˆt=0,
Linear feedback stochastic equilibrium control
In this subsection, our goal is to obtain a state feedback representation of an equilibrium control for Problem (N) via some class of ordinary differential equations.
Now, suppressing the arguments s,ei for the coefficients A, B, b, Ci, Di, σi, we use the notation ϱz instead of ϱs,z,ei for ϱ=Ek,Fk and ck. First, for any deterministic, differentiable function η∈C0,T×χ;Rn×n consider the differential-difference operator
Lηs,·=η′s,·+∑j=1dλijηs,ej−ηs,·.
Then we introduce the following system of differential-difference equations, for s∈0,T:
0=LMs,ei+Ms,eiA+A⊤Ms,ei+∑i=1pCi⊤Ms,eiCi0=−Ms,eiB+∑i=1pCi⊤Ms,eiDi0=+∑k=1l∫R∗Ekz⊤Ms,eiFkzθαkdzΨs,ei0=+∑k=1l∫R∗Ekz⊤Ms,eiEkzθαkdz+Q,0=LM¯s,ei+M¯s,eiA+A⊤M¯s,ei−M¯s,eiBΨs,ei+Q¯,0=LΥs,ei+A⊤Υs,ei,0=Lφs,ei+A⊤φs,ei+Ms,ei+M¯s,eib−Bψs,ei0=+∑i=1pCi⊤Ms,eiσi−Diψs,ei0=+∑k=1l∫R∗Ekz⊤Ms,eickz−Fkzψs,eiθαkdz,MT,ei=G;M¯T,ei=G¯;ΥT,ei=μ1;φT,ei=μ2,
where Ψ·,· and ψ·,· are given by
Ψs,ei≜Θs,eiB⊤Ms,ei+M¯s,ei+Υs,eiΨs,ei≜+∑i=1pDi⊤Ms,eiCi+∑k=1l∫R∗Fkz⊤Ms,eiEkzθαkdz,ψs,ei≜Θs,eiB⊤φs,ei+∑i=1pDi⊤Ms,eiσiψs,ei≜+∑k=1l∫R∗Fkz⊤Ms,eickzθαkdz,
with
Θs,·=R+∑i=1pDi⊤Ms,·Di+∑k=1l∫R∗Fkz⊤Ms,·Fkzθαkdz−1.
The following theorem presents the existence condition for a linear feedback equilibrium control.
Let (H1)–(H2) hold. Suppose that the system of equations (3.23) admit a solutionM·,ei,M¯·,ei,Υ·,eiandφ·,ei, for anyei∈X, onC(0,T;Rn×n). Then the time-inconsistent LQ Problem(N)has an equilibrium control that can be represented by the state feedback formuˆt=−Ψt,αtXˆt−−ψt,αt,whereΨ·,·andψ·,·are given by (3.24).
Uniqueness of the equilibrium control
In this subsection, we prove that if the system of equations (3.23) is solvable, then the state feedback equilibrium control given by (3.25) is the unique open-loop Nash equilibrium control of Problem (N).
Let (H1)–(H2) hold. Suppose thatM·,·,M¯·,·,Υ·,·andφ·,·are solutions to the system (3.23). Thenuˆ·given by (3.25) is the unique open-loop Nash equilibrium control for Problem(N).
Applications
In this section, we discuss an extension of a new class of optimization problems [36], in which the investor manages her/his wealth by consuming and investing in a financial market subject to a mean variance criterion controlling the final risk of the portfolio. This problem can be eventually formulated as a time-inconsistent stochastic LQ problem and solved by the results presented in the preceding sections.
Conditional mean-variance-utility consumption–investment and reinsurance problem
We study equilibrium reinsurance (eventually new business), investment and consumption strategies for mean-variance-utility portfolio problem where the surplus of the insurers is assumed to follow a jump-diffusion model. The financial market consists of one riskless asset and one risky asset whose price processes are described by regime-switching SDEs. The problem is formulated as follows. Consider an insurer whose surplus process is described by the jump-diffusion model
dΛs=cds+β0dW1s−d∑i=1NαsYi,s∈0,T,
where c>00$]]> is the premium rate, β0 is a positive constant, W1 is a one-dimensional standard Brownian motion, Nα is a Poisson process with intensity λ>00$]]> and Yii∈N−0 is a sequence of independent and identically distributed positive random variables with common distribution PY having finite first and second moments μY=∫0∞zPYdz and σY=∫0∞z2PYdz. We assume that W1, Nα, and ∑i=1Nα.Yi are independent. Let Y be a generic random variable which has the same distribution as Yi. The premium rate c is assumed to be calculated via the expected value principle, i.e. c=1+ηλμY with safety loading η>00$]]>.
Note that the process ∑i=1NαsYi can also be defined through a random measure Nα1ds,dz as
∑i=1NαsYi=∫0s∫0∞zNα1dr,dz,
where Nα1 is a finite Poisson random measure with a random compensator having the form θα1dzds=λPYdzds. We recall that N˜α1ds,dz=Nα1ds,dz−θα1dzds defines the compensated jump martingale random measure of Nα1. Obviously, we have
∫0+∞zθα1dzds=λ∫0+∞zPYdzds=λμYds.
Hence (4.1) is equivalent to
dΛs=ηλμYds+β0dW1s−∫0+∞zN˜α1ds,dz.
Suppose that the insurer is allowed to invest its wealth in a financial market, in which two securities are traded continuously. One of them is a bond with price S0s at time s∈0,T governed by
dS0s=r0s,αsS0sds,S00=s0>0.0.\]]]>
There is also a risky asset with unit price S1s at time s∈0,T governed by
dS1s=S1s−σs,αsds+βs,αsdW2s+∫−1+∞zNα2ds,dz−θα2dzds,S10=s1>0,0,\end{aligned}\]]]>
where r0,σ,β:0,T×X→0,∞ are assumed to be deterministic and continuous functions such that σs,αs>r0s,αs>0{r_{0}}\left(s,\alpha \left(s\right)\right)>0$]]>, W2· is a one-dimensional standard Brownian motion, Nα2 is a finite Poisson random measure with random compensator having the form nα2ds,dz=θα2dzds. We assume that W1·, W2·, Nα1·,· and Nα2·,· are independent and θα2· is a Lévy measure on −1,+∞ such that ∫−1+∞z2θα2dz<∞.
The insurer, starting from an initial capital x0>00$]]> at time 0, is allowed to dynamically purchase proportional reinsurance (acquire new business), invest in the financial market and consuming. A trading strategy u· is described by a three-dimensional stochastic processes u1·,u2·,u3·⊤. The strategy u1s≥0 represents the retention level of reinsurance or new business acquired at time s∈0,T. We point that u1s∈0,1 corresponds to a proportional reinsurance cover and shows that the cedent should divert part of the premium to the reinsurer at the rate of 1−u1t)θ0+1λμY, where θ0 is the relative safety loading of the reinsurer satisfying θ0≥η. Meanwhile, for each claim Y occurring at time s, the reinsurer pays 1−u1t)Y of the claim, and the cedent pays the rest. u1s∈1,+∞ corresponds to acquiring new business. u2s≥0 represents the amount invested in the risky stock at time s. The dollar amount invested in the bond at time s is Xx0,ei0,u·s−u2s, where Xx0,ei0,u·· is the wealth process associated with strategy u· and the initial states x0,ei0, u3s represents the consumption rate at time s∈0,T. Thus, incorporating reinsurance/new business, and investment strategies into the surplus process and the risky asset, respectively. As time evolves, we consider the evolution of the controlled stochastic differential equation parametrized by t,ξ,ei∈0,T×L2Ω,Ftα,P;R×χ and satisfied by X·: for s∈0,T,
dXs=r0s,αsXs+δ+θ0u1sλμY+rs,αsu2sdsdXs=−u3sds+β0u1sdW1s+βs,αsu2sdW2sdXs=−u1s−∫0+∞zN˜α1ds,dz+u2s−∫−1+∞zN˜α2ds,dz,Xt=ξ,αt=ei,
where rs,αs=σs,αs−r0s,αs and δ=η−θ0. Then, for any t,ξ,ei∈0,T×L2Ω,Ftα,P;R×χ the mean-variance-utility consumption–investment and reinsurance optimization problem is reduced to maximization of the utility function J(t,ξ,ei;·) given by
Jt,ξ,ei;u·=E∫tT12hs−tu3(s)2ds+12VarXTFTα−μ1ξ+μ2EXTFTα,
subject to (4.5), where h·:[0,T]→R is a general deterministic nonexponential discount function satisfying h(0)=1, h(s)>00$]]>ds-a.e. and ∫0Th(s)ds<∞. In this paper we consider general discount functions satisfying the above assumptions. Some possible examples of discount functions are considered in the literatures [42] and [10].
Similar to [19] and [21], due to the presence of the observable random factor α·, we consider the expectation of a conditional mean-variance criterion in the above cost functional. This is different from the mean-variance portfolio selection problem with regime switching considered in [41] and [5]. In [21], a conditional mean-variance portfolio selection problem with common noise is proposed and solved using the linear-quadratic optimal control of the conditional McKean–Vlasov equation with random coefficients and dynamic programming approach.
With n=1, p=l=m=3, the optimal control problem associated with (4.5) and (4.6) is equivalent to maximization of
Jt,ξ,ei;u.=E∫tT12hs−tΓ⊤Γu(s),usds+12VarXTFTα−μ1ξ+μ2EXTFTα,
subject to (2.1). Here A=r0s,αs, B=λμYθ0rs,αs−1, b=δλμY, D1=β000, D2=0βs,αs0, Q=0, Q¯=0, F1z=−z10,∞z00, F2z=0z1−1,∞z0, Γ=001, Rt,s=hs−tΓ⊤Γ, G=1, G¯=−1, Ci=0, σi=0, Ekz=0 and ckz=0. Thus, the above model is a special case of the general time-inconsistent LQ problem formulated earlier in this paper. Then we apply Corollary 12 and Theorem 13 to obtain the unique Nash equilibrium trading strategy. Define
ρs,αs≜λμYθ02β02+∫0+∞z2θα1dz+rs,αs2βs,αs2+∫−1+∞z2θα2dz.
Then the system (3.23) reduced to the following: for s∈0,T,
M′s,ei+Ms,ei2r0s,ei−Υs,ei+λii−ρs,eiΥs,ei+∑j≠idλijMs,ej=0,M¯′s,ei+M¯s,ei2r0s,ei−Υs,ei+λii−ρs,eiΥs,ei+∑j≠idλijM¯s,ej=0,Υ′s,ei+Υs,eir0s,ei+λii+∑j≠idλijΥs,ej=0,φ′s,ei+φs,eir0s,ei+λii+∑j≠idλijφs,ej=0,MT,ei=1,M¯T,ei=−1,ΥT,ei=−μ1,φT,ei=−μ2.
By standard arguments, we obtain, for s∈0,T and ei∈X,
Ms,ei=e∫sT2r0τ,ei−Υτ,ei+λiidτ1+∫sTe−∫τT2r0u,ei−Υu,ei+λiidu−ρτ,eiΥτ,ei+∑j≠idλijMτ,ejdτ,=M¯s,ei,
also we have, for ei∈X,
Υs,ei=e∫sTr0τ,ei+λiidτ×−μ1+∫sTe∫τT−r0u,ei+λiidu∑j≠idλijΥτ,ejdτ
and
φs,ei=e∫sTr0τ,ei+λiidτ×−μ2+∫sTe∫τT−r0u,ei+λiidu∑j≠idλijφτ,ejdτ.
In view of Theorem 13, the Nash equilibrium control (3.25) gives, for s∈0,T, uˆ1s=−∑i=1dαs−,eiλμYθ0β02+∫0+∞z2θα1dzΦ1s,eiXˆs+Φ2s,ei,uˆ2s=−∑i=1dαs−,eirs,eiβs,ei2+∫−1+∞z2θα2dzΦ1s,eiXˆs+Φ2s,ei,uˆ3s=∑i=1dαs−,eiΥs,eiXˆs+φs,ei, where ∀s,ei∈0,T×XΦ1s,ei=e∫sT−r0τ,ei+Υτ,eidτ−μ1+∫sTe∫τT−r0u,ei+λiidu∑j≠idλijΥτ,ejdτ1+∫sTe−∫τT2r0u,ei−Υu,ei+λiidu−ρτ,eiΥτ,ei+∑j≠idλijMτ,ejdτ,
and
Φ2s,ei=e∫sT−r0τ,ei+Υτ,eidτ−μ2+∫sTe∫τT−r0u,ei+λiidu∑j≠idλijφτ,ejdτ1+∫sTe−∫τT2r0u,ei−Υu,ei+λiidu−ρτ,eiΥτ,ei+∑j≠idλijMτ,ejdτ.
The conditional expectation of the corresponding equilibrium wealth process solves the equation
dEXˆsFTα=P1s,αsEXˆsFTα+P2s,αsds,EXˆ0FTα=x0,
where
P1s,αs=r0s,αs−ρs,αsΦ1s,αs−Υs,αs,P2s,αs=−ρs,αsΦ2s,αs−φs,αs+bs,αs.
Technical computations show that
dEXˆs2FTα=2P1s,αs+P3s,αsEXˆs2FTαdEXˆs2FTα=+2P2s,αs+P4s,αsEXˆsFTαdEXˆs2FTα=+P5s,αsds,EXˆ02FTα=x02,
and
dVarXˆsFTα=2P1s,αsVarXˆsFTα+P3s,αsEXˆs2FTα+2P4s,αsEXˆsFTα+P5s,αsds,VarXˆ0FTα=0,
where
P3s,αs=ρs,αsΦ1s,αs2,P4s,αs=ρs,αsΦ1s,αsΦ2s,αs,P5s,αs=ρs,αsΦ2s,αs2.
Then
EXˆsFTα=∑i=1dαs−,eie∫0sP1τ,eidτEXˆsFTα=×x0+∫0se∫0τ−P1u,eiduP2τ,eidτ,EXˆs2FTα=∑i=1dαs−,eie∫0s2P1τ,ei+P3τ,eidτEXˆs2FTα=×x02+∫0se∫0τ−2P1u,ei+P3u,eiduEXˆs2FTα=×2P2τ,ei+P4τ,eiEXˆτFTα+P5τ,eidτ,
and
VarXˆsFTα=∑i=1dαs−,eie∫0s2P1τ,eidτ∫0se∫0τ−2P1u,eiduP3τ,eiEXˆτ2FTα+2P4τ,eiEXˆτFTα+P5τ,eidτ.
Hence the objective function value for the equilibrium trading strategy uˆ· is
J0,x0,ei0;uˆ·=E∑i=1dαT,ei∫0T12hsΥs,eiXˆs+φs,ei2ds+12e∫0T2P1τ,eidτ∫0Te∫0τ−2P1u,eiduP3τ,eiEXˆτ2FTα+2P4τ,eiEXˆτFTα+P5τ,eidτ−μ1x0+μ2e∫0TP1τ,eidτx0+∫0Te∫0τ−P1u,eiduP2τ,eidτ.
Conditional mean-variance investment and reinsurance strategies
In this subsection, we will address a special case where the insurer does not take into account the consumption strategy. The objective is to maximize the conditional expectation of terminal wealth EXTFTα and at the same time to minimize the conditional variance of the terminal wealth VarXTFTα, over controls u· valued in R2. Then, the mean-variance investment and reinsurance optimization problem is defined as minimizing the cost Jt,ξ,ei;· given by
Jt,ξ,ei;u·=12EVarXTFTα−μ1ξ+μ2EXTFTα,
subject to, for s∈0,T,
dXs=r0s,αsXs+δ+θ0u1sλμY+rs,αsu2sdsdXs=+β0u1sdW1s+βs,αsu2sdW2sdXs=−u1s−∫0+∞zN˜α1ds,dz+u2s−∫−1+∞zN˜α2ds,dz,Xt=ξ,αt=ei,
where t,ξ,ei∈0,T×L2Ω,Ftα,P;R×χ and u·=u1·,u2·⊤ is an admissible trading strategy.
In this case, the equilibrium strategy given by the expressions (4.10) and (4.11) changes to, for s∈0,T, uˆ1s=−∑i=1dαs−,eiλμYθ0β02+∫0+∞z2θα1dzΦ1s,eiXˆs+Φ2s,ei,uˆ2s=−∑i=1dαs−,eirs,eiβs,ei2+∫−1+∞z2θα2dzΦ1s,eiXˆs+Φ2s,ei, where ∀s,ei∈0,T×XΦ1s,ei=e∫sT−r0τ,eidτ−μ1+∫sTe∫τT−r0u,eidu∑j≠idλijΥτ,ejdτ1+∫sTe−∫τT2r0u,ei+λiidu−ρτ,eiΥτ,ei+∑j≠idλijMτ,ejdτ,Φ2s,ei=e∫sT−r0τ,eidτ−μ2+∫sTe∫τT−r0u,eidu∑j≠idλijφτ,ejdτ1+∫sTe−∫τT2r0u,ei+λiidu−ρτ,eiΥτ,ei+∑j≠idλijMτ,ejdτ.
Numerical example. In this section, by providing some numerical examples, we demonstrate the validity and good performance of our proposed study in solving the mean-variance problem with the Markov switching. For simplicity, let us consider Equation (4.16) in which the Markov chain takes two possible states e1=1 and e2=2, i.e. χ=1,2, with the generator of the Markov chain being
H=2−2−44
and the initial condition X0=1.1. For illustration purpose, we assume the finite time horizon is given as T=60 and that the coefficients of the dynamic equation are given below
r0αt
rαt
βαt
δ
θ0
β0
λ
μY
αt=1
0.35
0.20
0.30
0.09
1.5
0.5
0.65
0.6
αt=2
0.40
0.25
0.55
0.09
1.5
0.5
0.65
0.6
We consider the cost function defined by Equation (4.15) with μ1=μ2=1. Without loss of generality we use the notation EX(t,i) for EXˆtFTi where i=1,2 and α.
The state change of the Markov chain
Expected equilibrium wealth in the three modes for i=1,2 and alpha
Trajectories of the equilibrium wealth correspond to the Markov chain
Figure 1 depicts the state change of the Markov chain α(·) between 0 and 60 units of time, where the initial state is assume to be α(0)=1.
Figure 2 presents the curves of the different state trajectories of the equilibrium expected wealth EX(t,i), in the three mods: i=1, i=2 and i=αt. By using Matlab’s advanced ODE solvers (particularly the function ode45) and Markov chain α·, we can achieve trajectories of EX(t,1), EX(t,2) and EX(t,αt) and their graphs: the dashed blue line is the graph of EX(t,1), the continuous brown line is the graph of EX(t,2), and the solid black line is the graph of EX(t,αt), whose values are switched between the dashed blue line and the continuous brown line.
Figure 3 shows the state trajectory of the equilibrium wealth X(·). In fact, when α0=1, X(0)=1.1 is the initial state trajectory. Then the values are also switched between two paths which are the trajectories of the equilibrium wealth corresponding to the different states of the Markov chain: αt=1 and αt=2. As a result, by comparing with Figure 1, we can clearly see how the Markovian switching influences the overall behavior of the state trajectories of the equilibrium wealth.
Special cases and relationship to other worksClassical Cramér–Lundberg model
Now, assume that the insurer’s surplus is modelled by the classical Cramér–Lundberg (CL) model (i.e. the model (4.2) with β0=0), and that the financial market consists of one risk-free asset whose price process is given by (4.3), and only one risky asset whose price process does not have jumps and is modelled by a diffusion process (i.e. the model (4.4) with z=0,ds-a.e.). Then the dynamics of the wealth process X·=Xt,ξ,ei·;u· which corresponds to an admissible strategy u·=u1·,u2·⊤ and initial triplet t,ξ,ei∈0,T×L2Ω,Ftα,P;R×X can be described, for s∈t,T, by
dXs=r0s,αsXs+δ+θ0u1sλμY+rs,αsu2sdsdXs=+βs,αsu2sdW2s−u1s−∫0+∞zN˜α1ds,dz,Xt=ξ,αt=ei.
We derive the equilibrium strategy which is described for the following two cases.
Case 1:μ1=0. We suppose that μ1=0 and μ2=1γ, such that γ>00$]]>. Then the minimization problem (4.15) reduces to
minJt,ξ,ei;u·=E12VarXTFTα−1γEXTFTα,
subject to u·∈LF,p20,T;R2, where X·=Xt,ξ,ei·;u· satisfies (4.21), for every t,xt,ei∈0,T×L2Ω,Ftα,P;R×χ. In this case the equilibrium reinsurance–investment strategy given by (4.17) and (4.18) for s∈0,T becomes uˆ1s=−∑i=1dαs−,eiλμYθ0∫0+∞z2θα1dzΦ1s,eiXˆs+Φ2s,ei,uˆ2s=−∑i=1dαs−,eirs,eiβs,ei2Φ1s,eiXˆs+Φ2s,ei, where Φ1s,ei and Φ2s,ei are given by (4.19) and (4.20) for μ1=0 and μ2=1γ.
In the absence of the Markov chain, i.e. when d=1, ℓs,αs≡ℓs for ℓ=r0,r and β, the equilibrium solution (4.23) and (4.24) for s∈0,T reduces to
uˆ1s=λμYθ0e∫sT−r0τdτγ∫0+∞z2θ1dz,uˆ2s=rse∫sT−r0τdτγβs2.
It is worth pointing out that the above equilibrium solutions are identical to the ones found in Zeng and Li [43] by solving some extended HJB equations.
Case 2:μ2=0. Now, suppose that μ1=1γ and μ2=0, such that γ>00$]]>. Then the minimization problem (4.15) reduces to
minJt,ξ,ei;u·=E12VarXTFTα−ξγEXTFTα,
for any t,xt,ei∈0,T×L2Ω,Ftα,P;R×χ. This is the case of the mean-variance problem with state dependent risk aversion. For this case the equilibrium reinsurance–investment strategy given by (4.17) and (4.18) for s∈0,T, reduces to uˆ1s=−∑i=1dαs−,eiλμYθ0∫0+∞z2θα1dzΦ1s,eiXˆs+Φ2s,ei,uˆ2s=−∑i=1dαs−,eirs,eiβs,ei2Φ1s,eiXˆs+Φ2s,ei, where Φ1s,ei and Φ2s,ei are given by (4.19) and (4.20) for μ1=1γ and μ2=0.
In the absence of the Markov chain the equilibrium solution reduces for s∈0,T to uˆ1s=λμYθ0e∫sT−r0τdτXˆs∫0+∞z2θ1dzγ+∫sTe−∫τTr0uduρτdτ,uˆ2s=rse∫sT−r0τdτXˆsβs2γ+∫sTe−∫τTr0uduρτdτ.
The equilibrium reinsurance–investment solution presented above is comparable to that found in Li and Li [17] in which the equilibrium is however defined within the class of feedback controls. Note that in [17] the authors adopted the approach developed by Björk et al. [4] and they have obtained feedback equilibrium solutions via some well posed integral equations.
The investment only
In this subsection, we consider the investment-only optimization problem. In this case the insurer does not purchase reinsurance or acquire new business, which means that u1s≡1, and his consumption is not taken into account. We assume that the financial market consists of one risk-free asset whose price process is given by (4.3), and only one risky asset whose price process does not have jumps. A trading strategy u· reduces to a one-dimensional stochastic processes u2· in this case, where u2s represents the amount invested in the risky stock at time s. The dynamics of the wealth process X· which corresponds to an admissible investment strategy u2· and initial triplet t,ξ,ei∈0,T×L2Ω,Ftα,P;R×X can be described by
dXs=r0s,αsXs+δλμY+rs,αsu2sds+β0dW1sdXs=+βs,αsu2sdW2s−∫0+∞zN˜α1ds,dz,fors∈t,T,Xt=ξ,αt=ei.
Similar to the previous subsection, for the investment-only case we derive the equilibrium strategy which is described in the following two cases.
Case 1:μ1=0. We suppose that μ1=0 and μ2=1γ, such that γ>00$]]>. In this case the equilibrium investment strategy given by (4.17) becomes
uˆ2s=−∑i=1dαs−,eirs,eiβs,ei2Φ1s,eiXˆs+Φ2s,ei,s∈0,T,
where Φ1s,ei and Φ2s,ei are given by (4.19) and (4.20) for μ1=0 and μ2=1γ.
In the absence of the Markov chain the equilibrium solution reduces to
uˆ2s=rse∫sT−r0τdτγβs2,s∈0,T.
This essentially covers the solution obtained by Björk and Murgoci [3] by solving some extended HJB equations.
Case 2:μ2=0. Now, suppose that μ1=1γ and μ2=0, such that γ>00$]]>. This is the case of the mean-variance problem with state-dependent risk aversion. For this case the equilibrium investment strategy given by (4.17) reduces to
uˆ2s=−∑i=1dαs−,eirs,eiβs,ei2Φ1s,eiXˆs+Φ2s,ei,s∈0,T,
where Φ1s,ei and Φ2s,ei are given by (4.19) and (4.20) for μ1=1γ and μ2=0.
In the absence of the Markov chain the equilibrium solution reduces to
uˆ2s=rse∫sT−r0τdτXˆsβs2γ+∫sTe−∫τTr0uduρτdτ,s∈0,T.
This essentially covers the solution obtained by Hu et al. [15].
Conclusion
In this paper, we have considered a class of dynamic decision models of conditional time-inconsistent LQ type, under the effect of a Markovian regime-switching. We have employed the game theoretic approach to handle the time inconsistency. Throughout this study open-loop Nash equilibrium strategies are established as an alternative to optimal strategies. This was achieved using a stochastic system that includes a flow of forward-backward stochastic differential equations under equilibrium conditions. The inclusion of concrete examples in mathematical finance confirms the validity of our proposed study. The work may be developed in different ways:
The methodology may be expanded, for example, to a non-Markovian framework, implying that the coefficients of the controlled SDE as well as the coefficients of the objective functional are random. The research on this topic is in progress and will be covered in our forthcoming paper.
As the reviewer suggests, the model discussed in this paper may be extended to “progressive measurable” as an alternative of “predictable” control problem, and a research problem on how to obtain the corresponding state feedback equilibrium strategy is a very interesting and challenging one (see [29] for more details). Some further investigations will be carried out in our future publications.
Acknowledgments
We would like to thank the anonymous reviewer and the Editor for their constructive comments and suggestions on an earlier version of this paper, which led to a considerable improvement of the presentation of the work.
AppendixProofs and technical results
As the coefficients are affected by a random Markov switching and since we consider a family of a continuum of random variables (conditional expectations) parametrized by ε>00$]]>, the limit in (3.2) is taken with any sequence εn tending to 0, not ε tending to 0, see Definition 4. Due to the uncountable cardinality of ε>00$]]>, the a.s. limit with respect to the whole ε>00$]]> may not make sense and this is the reason of using εn instead. We should consider a subsequence for the limit procedures in the proofs. To do so, we use the following lemma which was proved by Wang in [32], Lemma 3.3.
Iff·=(f1·,…,fm·)∈LFp(0,T;Rm)withm∈Nandp>11$]]>, then fordt-a.e., there exists a sequence{εnt}n∈N⊂(0,T−t)depending on t such thatlimn→∞εnt=0andlimn→∞1εntE∫tt+εntfis−fitpds=0,fori=1,…,m,dP-a.s.
It is clear that ϕs,αs is invertible for ∀s∈0,T. We denote by ϕs,αs−1 the inverse of ϕs,αs. Define for t∈0,T and s∈t,T the process
p¯s;t≡−ϕs,αsps;t−G¯EXˆTFTα−μ1Xˆt−μ2−∫sTϕτ,ατQ¯EXˆτFταdτ,
and q¯is;t,r¯ks,z;t,l¯js;t=−ϕs,αsqis;t,rks,z;t,ljs;t, for i=1,2,…,p; k=1,2,…,l and j=1,2,…,d. Then for any t∈0,T, in the interval t,T, the 4-tuple p¯·;t,q¯·;t,r¯·,·;t,l¯·;t satisfies
dp¯s;t=−∑i=1pϕs,αsCi⊤ϕs,αs−1q¯is;tdp¯s;t=+∑k=1l∫R∗ϕs,αsEkz⊤ϕs,αs−1r¯ks,z;tθαkdzdp¯s;t=+ϕs,αsQXˆsds+∑i=1pq¯is;tdWisdp¯s;t=+∑k=1l∫R∗r¯ks−,z;tN˜αkds,dz+∑j=1dl¯js,tdΦ˜js,p¯T;t=GXˆT.
Moreover, it is clear that for any t1,t2,s∈0,T such that 0<t1<t2<s<T, we have
p¯s;t1,q¯is;t1,r¯ks,z;t1,l¯js;t1=p¯s;t2,q¯is;t2,r¯ks,z;t2,l¯js;t2.
Hence, the solution p¯·;t,q¯·;t,r¯·,·;t,l¯·;t does not depend on t. Thus we denote the solution of (A.1) by p¯·,q¯·,r¯·,·,l¯·.
We have then, for any t∈0,T and s∈t,T,
ps;t=−ϕs,αs−1p¯s+G¯EXˆTFTα+μ1Xˆt+μ2+∫sTϕτ,ατQ¯EXˆτFταdτ,
and qis;t,rks,z;t,ljs;t=−ϕs,αs−1q¯is,r¯ks,z,l¯js for i=1,2,…,p, k=1,2,…,l, and j=1,2,…,d. □
From the representation (A.2) we have, for any t∈0,T and s∈t,T,
Us;t−Us;s=B⊤ps;t−ps;s=B⊤ϕs,αs−1μ1Xˆs−Xˆt.
Moreover, since B and ϕs,αs−1 are uniformly bounded, for any a>00$]]>, t∈0,T and ε∈0,T−t, we obtain
P1εE∫tt+εUs;tds−1εE∫tt+εUs;sds≥a,≤1aE1εE∫tt+εUs;tds−1εE∫tt+εUs;sdsds,≤K1ε∫tt+εEXˆs−Xˆtds=0,
where the last equality is due to Xˆ· being right-continuous with finite left limits.
Hence, for each t there exists a sequence εntn≥0⊂0,T−t such that limn→∞εnt=0 and
limn→∞1εntE∫tt+εntUs;tds−1εntE∫tt+εntUs;sds=0,dP-a.s.
Moreover, we get from Lemma 16 that there exists a subsequence of εntn≥0, which we also denote by εntn≥0, such that
limn→∞1εntE∫tt+εntUs;sds=Ut;t,dt-a.e.,dP-a.s.
□
Proceeding with standard arguments by using Gronwall’s lemma and the moment inequalities for diffusion processes with jumps (see, e.g., Lemma 4.1 in [29]), we obtain (3.15) and (3.16).
Moreover, it follows from the dynamics of yε,v· in (3.13) that
Eyε,vsFsα=∫tsE[A(r,αr)yε,vrFrα]dr
for all s∈[t,T]. By setting Ψ(s)=A(s,αs) in Lemma A.1 in [30], we get for some positive constants C that
∫tsE[A(r,αr)yε,vrFrα]dr2≤C∫tsE[A(r,αr)yε,vrFrα]2dr,≤Cεξε,
where ξ:Ω×]0,∞[→]0,∞[ satisfies ξ(ε)↓0 as ε↓0, a.s., which proves (3.17).
Now, we consider the difference
Jt,Xˆt,αt;uε.−Jt,Xˆt,αt;uˆ.=E∫tTQXˆs+Q¯EXˆsFsα,yε,vs+Yε,vs+12Qyε,vs+Yε,vs,yε,vs+Yε,vs+12Q¯Eyε,vs+Yε,vsFsα,Eyε,vs+Yε,vsFsα+Ruˆs,v1t,t+εs+12Rv,v1t,t+εsds+12Gyε,vT+Yε,vT,yε,vT+Yε,vT+GXˆT+G¯EXˆTFTα+μ1Xˆt+μ2,yε,vT+Yε,vT+12G¯Eyε,vT+Yε,vTFTα,Eyε,vT+Yε,vTFTα.
From (H1) and (3.15)–(3.17) the following estimate follows:
E∫tT12Q¯Eyε,vs+Yε,vsFsα,Eyε,vs+Yε,vsFsαds+12G¯Eyε,vT+Yε,vTFTα,Eyε,vT+Yε,vTFTα=oε.
Then, from the terminal conditions in the adjoint equations, it follows that
Jt,Xˆt,αt;uε.−Jt,Xˆt,αt;uˆ.=E∫tTQXˆs+Q¯EXˆsFsα,yε,vs+Yε,vs+12Qyε,vs+Yε,vs,yε,vs+Yε,vs+Ruˆs,v1t,t+εs+12Rv,v1t,t+εsds−pT;t,yε,vT+Yε,vT−12PTyε,vT+Yε,vT,yε,vT+Yε,vT+oε.
Now, by applying Ito’s formula to s↦ps;t,yε,vs+Yε,vs on t,T and by taking the expectation, we get
EpT;t,yε,vT+Yε,vT=E∫tTv⊤BTps;t1t,t+εs+yε,vs+Yε,vs⊤QXˆs+Q¯EXˆsFsα+∑i=1pv⊤DiTqis1t,t+εs+∑k=1l∫R∗v⊤FkzTrks,zθαkdz1t,t+εsds.
By applying Ito’s formula to s↦Psyε,vs+Yε,vs,yε,vs+Yε,vs on t,T, we conclude from (H1) together with (3.15)–(3.17) and by taking the conditional expectation that
EPTyε,vT+Yε,vT,yε,vT+Yε,vT=E∫tTyε,vs+Yε,vs⊤Qsyε,vs+∑i=1pv⊤Di⊤PsDiv1t,t+εs+∑k=1l∫R∗v⊤Fkz⊤Ps+Γs,zFkzv1t,t+εsθαkdzds+oε.
By taking (A.6) and (A.7) in (A.5), it follows that
Jt,Xˆt,αt;uε.−Jt,Xˆt,αt;uˆ.=−E∫tt+εv⊤B⊤ps;t+∑i=1pv⊤Di⊤qis+12∑i=1pv⊤Di⊤PsDiv−v⊤Ruˆs−12v⊤Rv+∑k=1l∫R∗v⊤Fkz⊤rks,z+12Ps+ΓsFkzvθαkdzds+oε,
which is equivalent to (3.18). □
First, we have
Jt,Xˆt,αt;uε·−Jt,Xˆt,αt;uˆ·=E∫tT12QXεs+Xˆs+Q¯EXεs+XˆsFsα,Xεs−Xˆs+12Ruεs+uˆs,uεs−uˆsds+12GXεT+XˆT+G‾EXεT+XˆTFTα+2μ1Xˆt+μ2,XεT−XˆT.
Noting that by applying Itô’s formula to s↦ps;t,Xεs−XˆsEGXˆT+G‾EXˆTFTα+μ1Xˆt+μ2,XεT−XˆT=−E∫tTB⊤ps;t+∑i=1pv⊤Di⊤qis+∑k=1l∫R∗Fkz⊤rks,zθαkdz,uεs−uˆs+QXˆs+Q¯EXˆsFTα,Xεs−Xˆsds.
By completing the square we get
=E∫tTQ2Xεs−Xˆs2+Q¯2EXεs+XˆsFTα2ds+12∫tt+εRv+2uˆs−2(B⊤ps;t+∑i=1pDi⊤qis+∑k=1l∫R∗Fkz⊤rks,zθαkdz),vds+G2XεT−XˆT2+G‾2EXεT+XˆTFTα2,≥12E∫tt+εRv−2Us;t,vds≥−∫tt+εEUs;t,vds.
Now we can divide by εn and send εn to 0. Therefore, it follows from Lemma 8 that uˆ· is an equilibrium control. □
Suppose that uˆ· is an admissible control and denote by Xˆ· a controlled process corresponding to it. According to Corollary 12, suppose that there exists a flow of 4-tuple of adapted processes for which the processes (Xˆ·,p·;·,q·;·,r·,·;·,l·;· satisfies the following system of regime-switching forward-backward stochastic differential equations
dXˆs=AXˆs+Buˆs+bds+∑i=1pCiXˆs+Diuˆs+σidWis+∑k=1l∫R∗EkzXˆs−+Fkzuˆs+ckzN˜αkds,dz,s∈0,T,dps;t=−A⊤ps;t+∑i=1pCi⊤qis;t+∑k=1l∫R∗Ekz⊤rks,z;tθαkdzdps;t=−QXˆs−Q¯EXˆsFsαds+∑i=1pqis;tdWisdps;t=+∑k=1l∫R∗rks,z;tN˜αkds,dz+∑j=1dljs,tdΦ˜js,s∈t,T,Xˆ0=x0,α0=ei0,pT;t=−GXˆT−G¯EXˆTFTα−μ1Xˆt−μ2,
with the equilibrium condition dP-a.s.,dt-a.e.B⊤pt;t+∑i=1pDi⊤qit+∑k=1l∫R∗Fkz⊤rkt,zθαkdz−Ruˆt=0.
Now, to solve the above system, we assume the following ansatz: for 0≤t≤s≤T, we put
ps;t=−Ms,αsXˆs−M¯s,αsEXˆsFsα−Υs,αsXˆt−φs,αs,
where M·,·, M¯·,·, Υ·,· and φ·,· are deterministic, differentiable functions which are to be determined. From the terminal condition of the adjoint process, M·,·, M¯·,·, Υ·,· and φ·,· must satisfy the following terminal boundary condition, for all ei∈χ,
MT,ei=G,M¯T,ei=G¯,ΥT,ei=μ1,φT,ei=μ2.
Applying Itô’s formula to (A.12) and using (A.10), it yields
dps;t=−LMs,αsXˆs+LM‾s,αsEXˆsFsα+LΥs,αsXˆt+Lφs,αs+Ms,αsAXˆs+Buˆs+b+M¯s,αsAEXˆsFsα+BEuˆsFsα+bds−Ms,αs∑i=1pCiXˆs+Diuˆs+σidWis−Ms,αs∑k=1l∫R∗EkzXˆs−+Fkzuˆs+ckzN˜αkds,dz−∑j=1dMs,ej−Ms,αs−Xˆs+M¯s,ej−M¯s,αs−EXˆsFsα+Υs,ej−Υs,αs−Xˆt+φs,ej−φs,αs−dΦ˜j(s).
Comparing with (A.10), we deduce that, for i=1,2,…,p, k=1,2,…,l, and j=1,2,…,d,
qis;t=qis=−Ms,αsCiXˆs+Diuˆs+σi,rks,z;t=rks,z=−Ms,αsEkzXˆs−+Fkzuˆs+ckz,ljs;t=ljs=−Ms,ej−Ms,αsXˆsljs;t=ljs=−M¯s,ej−M¯s,αsEXˆsFsαljs;t=ljs=−Υs,ej−Υs,αsXˆt−φs,ej−φs,αs.
Moreover, by taking (A.12) and (A.15) in (A.11), we obtain
Ruˆt+B⊤Mt,αt+M¯t,αt+Υt,αtXˆt+B⊤φt,αt+∑i=1pDi⊤Mt,αtCiXˆt+Diuˆt+σi+∑k=1l∫R∗Fkz⊤Mt,αtEkzXˆt−+Fkzuˆt+ckθαkdz=0.
Subsequently, we obtain that uˆ· admits the following representation
uˆs=−Ψs,αsXˆs−ψs,αs,
where Ψ·,· and ψ·,· are given by (3.24).
Hence (3.25) holds, and for s∈0,T we have
EuˆsFsα=−Ψs,αsEXˆsFsα−ψs,αs.
Next, comparing the ds term in (A.14) with the ones in the second equation in (A.10), then by using the expressions (3.25) and (3.24), we obtain
0=LM+MA+A⊤M+∑i=1pCi⊤MCi−MB+∑i=1pCi⊤MDi+∑k=1l∫R∗Ekz⊤MFkzθαkdzΨs,αs+∑k=1l∫R∗Ekz⊤MEkzθαkdz+QXˆs+LM¯+M¯A−BΨ+A⊤M¯+Q¯EXˆsFsα+LΥ+A⊤ΥXˆt+Lφ+A⊤φ+Ms,αs+M¯s,αsb−Bψs,αs+∑i=1pCi⊤Mσi−Diψ+∑k=1l∫R∗Ekz⊤Mckz−Fkzψθαkdz.
This suggests that the functions M·,·, M¯·,·, Υ·,· and φ·,· solve the system of equations (3.23). In addition, we can verify that Ψ·,· and ψ·,· in (3.25) are both uniformly bounded. Then for s∈0,T the following linear SDE with jumps
dXˆs=A−BΨs,αsXˆs+b−Bψs,αsdsdXˆs=+∑i=1pCi−DiΨs,αsXˆs+σi−Diψs,αsdWisdXˆs=+∑k=1l∫R∗Ekz−FkzΨs,αsXˆs−+ckz−Fkzψs,αsN˜αkds,dz,Xˆ0=x0,α0=ei0,
has a unique solution Xˆ·∈SF20,T;Rn, and the following estimate holds
Esups∈0,TXˆs2≤K1+x02.
Hence the control uˆ· defined by (3.23) is admissible. □
Suppose that there is another equilibrium control u˜·∈LF,p20,T;Rm and denote by X˜· its corresponding controlled sate equation, and by p˜·;·,q˜·,r˜·,·,l˜· its corresponding unique solution to the BSDE (3.4) with Xˆ· replaced by X˜·. Then by Corollary 12 the 5-tuple (p˜·;·,q˜·,r˜·,·,l˜·,u˜·) satisfies dP-a.s.,dt-a.e.B⊤p˜t;t+∑i=1pDi⊤q˜it+∑k=1l∫R∗Fkz⊤r˜kt,zθαkdz−Ru˜t=0.
Now, we define for t∈0,T, s∈t,T, i=1,…,p, k=1,…,l, j=1,2,…,d:
pˆs;t=p˜s;t+Ms,αsX˜s+M¯s,αsEX˜sFsαpˆs;t=+Υs,αsX˜t+φs,αs,qˆis=q˜is+Ms,αsCiX˜s+Diu˜s+σis,rˆks,z=r˜ks,z+Ms,αsEkzX˜s−+Fkzu˜s−+ckz,lˆjs=l˜js+Ms,ej−Ms,αsX˜slˆjs=+M¯s,ej−M¯s,αsEX˜sFsαlˆjs=+Υs,ej−Υs,αsX˜t+φs,ej−φs,αs.
It is easy to prove that
pˆ·;t,qˆ·,rˆ·,·,lˆ·∈L×LF20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd.
By (A.17) we have dP-a.s.,dt-a.e.−B⊤pˆt;t−Mt,αt+M¯t,αt+Υt,αtX˜t−−φt,αt−∑i=1pDi⊤qˆit−Mt,αtCitX˜t−+Diu˜t+σi−∑k=1l∫R∗Fkz⊤rˆkt,z−Mt,αtEkzX˜t−+Fkzu˜t+ckzθαkdz+Ru˜t=0.
Since Θt,αt exists dP-a.s.,dt-a.e., using (3.24), we get
u˜t=Θt,αtB⊤pˆt;t+∑i=1pDi⊤qˆit+∑k=1l∫R∗Fkz⊤rˆkt,zθαkdz−Ψt,αtX˜t−−ψt,αt.
From the above equality, we remark that if pˆt;t=qˆt=rˆt,z=0, dP-a.s.,dt-a.e., then the form of u˜· is the same as the form of the feedback control law specified by (3.25), and hence the uniqueness of the equilibrium control given by (3.25) holds. Moreover, for any t∈0,T and for any s∈t,T we have
dpˆs;t=dp˜s;t+dMs,αsX˜s+M¯s,αsEX˜sFsα+Υs,αsX˜t+φs,αs.
Using the equations for p˜·;t, X˜·, M·,·, M¯·,·, Υ·,· and φ·,·, respectively, and using equality (4.6) we find that pˆ·;·,qˆ·,rˆ·,·,lˆ· satisfies
dpˆs;t=−gs,pˆs;t,qˆs,rˆs,z,pˆs;s,Epˆs;sFsα,EqˆsFsα,Erˆs,zFsαds+∑i=1pqˆisdWisdpˆs;t=+∑k=1l∫R∗rˆks−,zN˜αkds,dz+∑j=1dlˆjsdΦ˜js,0≤t≤s≤T,pˆT;t=0,t∈0,T,
where
gs,pˆs;t,qˆs,rˆs,z,pˆs;s,Epˆs;sFsα,EqˆsFsα,Erˆs,zFsα=A⊤pˆs;t+∑i=1pCi⊤qˆis+∑k=1l∫R∗Ekz⊤rˆks,zθαkdz−Ms,αsB+∑i=1pCi⊤Ms,αsDi+∑k=1l∫R∗Ekz⊤Ms,αsFkzθαkdzΘs,αs×B⊤pˆs;s+∑i=1pDi⊤qˆis+∑k=1l∫R∗Fkz⊤rˆks,zθαkdz−M¯s,αsBΘs,αsB⊤Epˆs;sFsα+∑i=1pDi⊤EqˆisFsα−M¯s,αsBΘs,αs+∑k=1l∫R∗Fkz⊤Erˆks,zFsαθαkdz.
We will prove in the next lemma that Equation (A.19) admits at most one solution in L×LF20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd. Thus pˆ≡0, qˆ≡0, rˆ≡0 and lˆ≡0, hence the uniqueness of the equilibrium control given by (3.25) holds. □
For the uniqueness of solution to (A.19), we have the following lemma.
Equation (A.19) admits at most one solution inL×LF20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd.
For any t∈0,T and s∈t,T, by Itô’s formula we have by taking expectations that there exists a constant K1>00$]]> such that
Epˆs;t2+∑i=1p∫sTqˆiτ2dτ+∑k=1l∫sT∫R∗rˆkτ,z2θαkdzdτ+∑j=1d∫sTlˆjτ2λjτdτ≤K1E∫sTpˆτ;tpˆτ;t+∑i=1pqˆiτ+∑k=1l∫R∗rˆkτ,zθαkdz+∑j=1dlˆjτλjτ+pˆτ;τ+Epˆτ;τFτα+∑i=1pEqˆiτFτα+∑k=1l∫R∗Erˆkτ,zFταθαkdz+∑j=1dElˆjτFταλjτdτ≤K2E∫sTpˆτ;t2+pˆτ;τ2dτ+12E∑i=1p∫sTqˆiτ2dτ+∑k=1l∫sT∫R∗rˆkτ,z2θαkdzdτ+∑j=1d∫sTlˆjτ2λjτdτ,
where we have used the inequality cab≤βc2a2+1βb2, ∀β>00$]]>, a>00$]]>, b>00$]]>. Hence there exists a K3>00$]]> such that
Epˆs;t2+∑i=1pE∫sTqˆiτ2dτ+∑k=1lE∫sT∫R∗rˆkτ,z2θαkdzdτ+∑j=1dE∫sTlˆjτ2λjτdτ≤K3E∫sTpˆτ;t2+pˆτ;τ2dτ.
Then we have, for any t∈0,T and s∈t,T,
Epˆs;t2≤K3E∫sTpˆτ;t2+pˆτ;τ2dτ,
thus
Epˆs;t2≤K3T−tsupτ∈t,TEpˆτ;t2+supτ∈t,TEpˆτ;τ2≤2K3T−tsupt≤τ≤s≤TEpˆs;τ2,
hence
supt≤τ≤s≤TEpˆs;τ2≤2K3T−tsupt≤τ≤s≤TEpˆs;τ2.
If we take ϵ=18K3, we get that, for t∈T−ϵ,T and s∈t,T,
supt≤τ≤s≤TEpˆs;τ2≤14supt≤τ≤s≤TEpˆs;τ2,
hence
supt≤τ≤s≤TEpˆs;τ2=0,
which means that pˆs;τ=0, P-a.s.∀τ,s∈τ,s:t≤τ≤s≤T. For t∈T−2ϵ,T−ϵ and s∈T−ϵ,T, since we have pˆτ;τ=0 for τ∈s,T, by (A.22), we have
Epˆs;t2≤K3E∫sTpˆτ;t2dτ,
and by Gronwall’s inequality we conclude that pˆs;t=0.
Now for t∈T−2ϵ,T−ϵ and s∈t,T−ϵ, since we have pˆT−ϵ;t=0, we apply the above analysis for the region t∈T−ϵ,T and s∈t,T, to confirm that pˆs;τ=0, P-a.s.∀τ,s∈τ,s:t≤τ≤s≤T−ϵ. We reiterate the same analysis for t∈T−3ϵ,T−2ϵ, and again and again up time t=0. Hence pˆs;t=0, P-a.s., and for every t,s∈D0,T.
Finally, by (A.21) we obtain
E∫0T∑i=1pqˆiτ2+∑k=1l∫R∗rˆτ,z2θαkdz+∑j=1dlˆjτ2λjτdτ≤K3E∫0Tpˆτ;t2+pˆτ;τ2dτ=0,
which yields that q¯≡0, r¯≡0 and l¯≡0. □
Existence and uniqueness of solutions to SDE and BSDE
In what follows, we will state some basic results on SDEs and BSDEs with jumps which we have used in this paper.
Let t∈0,T, denote by P the Ft-predictable σ-field on 0,T×F and by BH the Borel σ-algebra of any topological space H. For any given s∈0,T, consider the SDE with jumps
X(t)=ξ+∫stb(r,X(r),αr)dr+∫stσ(r,X(r),αr)dW(r)+∬R∗×(s,t]c(r,z,X(r−),αr)N˜α(dr,dz),
where s≤t≤T. Here the coefficients (ξ,b,g,σ) are given mappings ξ:Ω⟶Rn, b:[0,T]×Ω×Rn×χ⟶Rn, σ≡σ1,σ2,…,σp:[0,T]×Ω×Rn×χ⟶Rn×p, c≡c1,c2,…,cl:[0,T]×Ω×R∗×Rn×χ⟶Rn×l satisfying the assumptions below:
ξ∈L2Ω,Ft,P;Rn, the coefficients b,σ are P⊗BRn⊗Bχ measurable and c is P⊗BRn⊗B(R∗)⊗Bχ measurable and, for all ei∈χ,
E∫0Tb(t,0,ei)+σ(t,0,ei)+∫R∗c(t,z,0,ei)θαdzdt<∞;
b,σ and c are uniformly Lipschitz continuous w.r.t. x, that is, there exists a constant C>00$]]> s.t. for all (t,x,x¯,ei)∈[0,T]×Rn×Rn×χ and a.s. ω∈Ω,
|b(t,x,ei)−b(t,x¯,ei)|2+|g(t,x,ei)−g(t,x¯,ei)|2+∫R∗|σ(t,z,x,ei)−σ(t,z,x¯,ei)|2θαdz⩽C|x−x¯|2.
If the coefficients(ξ,b,g,σ)satisfy Assumption (H’1)–(H’2), then the SDE (A.23) has a unique solutionX(·)∈SF2s,T;Rn. Moreover, the following estimate holdsEsups≤t≤TXs2≤K1+Eξ2.
Let 0=τ0<τ1<τ2<⋯<τn<⋯ be the jump times of the Markov chain α(·), and let e1∈χ be the starting state. Thus α(t)=e1 on τ0,τ1, and the system (A.23) for t∈τ0,τ1 has the form:
dX(t)=b(t,X(t),e1)dt+σ(t,X(t),e1)dW(t)+∫R∗c(t,z,X(t−),e1)N˜α(dt,dz).
By Theorem 117 in [25], the above SDE has the unique solution X(·) in the space SF2τ0,τ1;Rn, and by continuity for t=τ1 as well. By considering ατ1=e2, the system for t∈τ1,τ2 becomes
dX(t)=b(t,X(t),e2)dt+σ(t,X(t),e2)dW(t)+∫R∗c(t,z,X(t−),e2)N˜α(dt,dz).
Again, by Theorem 117 in [25], this SDE has a unique solution X(·) in the space SF2([τ1,τ2);Rn), and by continuity for t=τ2 as well. Repeating this process continuously, we obtain that the solution X(·) of system (A.23) remains in SF20,T;Rn with probability one. □
The form of linear BSDEs (3.4) and (3.5) given in Section 3.1 is the motivation for us to study the following general BSDE with Markov switching
Y(t)=ς+∫tTg(s,Y(s),Z(s),K(s,·),Vs,α(s))ds−∫tT∑i=1pZi(s)dWi(s)−∫tT∫R∗∑r=1lKr(s,z)N˜αr(ds,dz)−∫tT∑j=1dVjsdΦ˜js,t∈[0,T].
Here g:Ω×[0,T]×Rn×Rn×d×L2R∗,BR∗,θ;Rn×l×Lλ2×χ→Rn, where Lλ2 is the set of functions I(·):χ→Rn×d such that ‖I(·)‖λ2:=∑j=1d|Ij(t)|2λj(t)<∞. We make the following assumptions.
ς∈L2Ω,Ft,P;Rn.
For all (y,z,k,υ)∈Rn×Rn×d×L2R∗,BR∗,θ;Rn×l×Lλ2 and ei∈χ, for i=1,…,d, f(·,y,z,k,υ,ei)∈LF20,T;Rn.
∀ei∈χ, f(t,y,z,k,υ,ei) is uniformly Lipschitz with respect to y, z, k and υ, i.e. there exists a constant C>00$]]>, such that for all (ω,t)∈Ω×[0,T], y,y′∈Rn, z,z′∈Rn×d, k,k′∈L2R∗,BR∗,θ;Rn×l, υ,υ′∈Lλ2,
f(t,y,z,k,ei)−ft,y′,z′,k′,ei≤Cy−y′+z−z′+k−k′θ+υ−υ′λ.
Suppose that (H’3)–(H’5) hold. Then BSDE with Markov switching (A.24) admits a unique solution.
Before proving this theorem, we give an extended martingale representation results by the following lemma. Its proof follows from Lemma 3.1 in Cohen and Elliott [7], together with Proposition 3.2 in Shi and Wu [27].
Lett∈0,T. ForM∈L2Ω,Ft,P;Rn, there exists a unique process(Y,Z,K,V)∈SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rndsuch thatMt=M(0)+∫0t∑i=1pZi(s)dWi(s)+∫0t∫R∗∑r=1lKr(s,z)N˜αr(ds,dz)+∫0t∑j=1dVjsdΦ˜js.
First we note that, for all
(y,z,k,υ)∈SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd,
the following is valid:
E∫0Tg(s,y(s),z(s),k(s,·),υs,α(s))ds2≤2E∫0T(g(s,y(s),z(s),k(s,·),υs,α(s))−g(s,0,0,0,0,α(s)))ds2+2E∫0Tg(s,0,0,0,0,α(s))ds2,≤C∑i=1dE∫0T|g(s,0,0,0,0,ei)|2ds+CE∫0T|y(s)|2+|z(s)|2+‖k(s,·)‖θ2+‖υ(s)‖λ2ds<∞.
It follows that
ς+∫0Tg(s,y(s),z(s),k(s,·),υs,α(s))ds∈L2Ω,Ft,P;Rn.
From assumptions (H’3)–(H’5), it is clear that
M(t)=Eξ+∫0Tg(s,y(s),z(s),k(s,·),υs,α(s))dt∣Ft
is a square integrable Ft-martingale. By virtue of martingale representation theorem, there exists
(Y,Z,K,V)∈SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd,
such that
M(t)=M0+∫0t∑i=1pZi(s)dWi(s)+∫0t∫R∗∑r=1lKr(s,z)N˜αr(ds,dz)+∫0t∑j=1dVjsdΦ˜js.
From the argument given above, we define the mapping Δ from
SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd
into itself by Δ(y,z,k,υ):=(Y,Z,K,V), and for
(y,z,k,υ)∈SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd
we introduce the norm defined by
‖(y,z,k,υ)‖β,θ,λ2:=E∫0Teβs|y(s)|2+|z(s)|2+‖k(s,·)‖θ+υ(s)λ2ds,
where β>00$]]> is to be determined later. We will prove that Δ is a contraction mapping under the norm ‖·‖β,θ,λ. For this purpose, let
(y,z,k,υ),y′,z′,k′,υ′∈SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd,
where (Y,Z,K,V)=Δ(y,z,k,υ),Y′,Z′,K′,V′=Δy′,z′,k′,υ′. We set
(yˆ,zˆ,kˆ,υˆ)=y−y′,z−z′,k−k′,υ−υ′,(Yˆ,Zˆ,Kˆ,Vˆ)=Y−Y′,Z−Z′,K−K′,V−V′.
We know that
(Yˆ,Zˆ,Kˆ,Vˆ)∈SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd,
and Esup0≤t≤T|Yˆ(t)|2<∞. Note that
Yˆ(t)=∫tTg(s,y(s),z(s),k(s,·),υs,α(s))−gs,y′(s),z′(s),k′(s,·),υ′s,α(s)ds−∫tTZˆ(s)dW(s)−∫tT∫R∗Kˆ(s,z)N˜α(ds,dz),t∈[0,T].
Applying Ito’s formula to |Yˆ(s)|2eβs, we can get
E|Yˆ(0)|2+E∫0Tβ|Yˆ(s)|2+|Zˆ(s)|2+‖Kˆ(s,·)‖θ2+‖Vˆ(s)‖λ2eβsds=E∫0T2Yˆ(s)g(s,y(s),z(s),k(s,·),υs,α(s))−gs,y′(s),z′(s),k′(s,·),υ′s,α(s)eβsds,≤2CE∫0TYˆ(s)|yˆ(s)|+|zˆ(s)|+‖kˆ(s,·)‖θ+‖υ(s)‖λeβsds,≤12E∫0T|yˆ(s)|2+|zˆ(s)|2+‖kˆ(s,·)‖θ2+‖υ(s)‖λ2eβsds+6C2E∫0T|Yˆs|2eβsds.
We choose β=1+6C2, and hence
E∫0T|Yˆ(s)|2+|Zˆ(s)|2+‖Kˆ(s,·)‖θ2+‖V(s)‖λ2eβsds≤12E∫0T|yˆ(s)|2+|zˆ(s)|2+‖kˆ(s,·)‖θ2+‖υ(s)‖λ2eβsds,
i.e.
‖(Yˆ,Zˆ,Kˆ,Vˆ)‖β,θ,λ≤12‖(yˆ,zˆ,kˆ,υˆ)‖β,θ,λ.
Then Δ is a strict mapping on
SF20,T;Rn×L20,T;Rnp×LF,pθ,20,T×R∗;Rnl×LF,pλ,20,T;Rnd.
It follows from the fixed-point theorem that this mapping admits a fixed point which is the unique solution of (A.24). The proof is complete. □
ReferencesBasak, S., Chabakauri, G.: Dynamic mean-variance asset allocation. 23, 2970–3016 (2010). https://doi.org/10.1093/rfs/hhq028Bensoussan, A., Sung, K.C.J., Yam, S.C.P.: Linear-quadratic time-inconsistent mean field games. 3(4), 537–552 (2013). MR3127149. https://doi.org/10.1007/s13235-013-0090-yBjörk, T., Khapko, M., Murgoci, A.: On time-inconsistent stochastic control in continuous time. 21, 331–360 (2017). MR3626618. https://doi.org/10.1007/s00780-017-0327-5Björk, T., Murgoci, A., Zhou, X.Y.: Mean-variance portfolio optimization with state-dependent risk aversion. 24(1), 1–24 (2014). MR3157686. https://doi.org/10.1111/j.1467-9965.2011.00515.xChen, P., Yang, H., Yin, G.: Markowitz’s mean-variance asset-liability management with regime switching: a continuous-time model. 43(3), 456–465 (2008). MR2479605. https://doi.org/10.1016/j.insmatheco.2008.09.001Chen, P., Yang, H.: Markowitz’s mean-variance asset-liability management with regime switching: a multi period model. 18(1), 29–50 (2011). MR2786975. https://doi.org/10.1080/13504861003703633Cohen, S.N., Elliott, R.J.: Solutions of backward stochastic differential equations on Markov chains. 2, 251–262 (2008). MR2446692. https://doi.org/10.31390/cosa.2.2.05Czichowsky, C.: Time-consistent mean-variance porftolio selection in discrete and continuous time. 17(2), 227–271 (2013). MR3038591. https://doi.org/10.1007/s00780-012-0189-9Delong, Ł., Gerrard, R.: Mean-variance portfolio selection for a nonlife insurance company. 66, 339–367 (2007). MR2342219. https://doi.org/10.1007/s00186-007-0152-2Ekeland, I., Mbodji, O., Pirvu, T.A.: Time-consistent portfolio management. 3, 1–32 (2012). MR2968026. https://doi.org/10.1137/100810034Ekeland, I., Pirvu, T.A.: Investment and consumption without commitment. 2, 57–86 (2008). MR2461340. https://doi.org/10.1007/s11579-008-0014-6Elliott, R.J., Aggoun, L., Moore, J.B.: . Springer, New York (1994). MR1323178Goldman, S.M.: Consistent plans. 47, 533–537 (1980)Hu, Y., Jin, H., Zhou, X.: Time inconsistent stochastic linear-quadratic control: characterization and uniqueness of equilibrium. 55(2), 1261–1279 (2017). MR3639569. https://doi.org/10.1137/15M1019040Hu, Y., Jin, H., Zhou, X.Y.: Time-inconsistent stochastic linear quadratic control. 50(3), 1548–1572 (2012). MR2968066. https://doi.org/10.1137/110853960Krusell, P., Smith, A.: Consumption and savings decisions with quasi-geometric discounting. 71, 366–375 (2003)Li, Y., Li, Z.: Optimal time-consistent investment and reinsurance strategies for mean-variance insurers with state dependent risk aversion. 53(1), 86–97 (2013). MR3081464. https://doi.org/10.1016/j.insmatheco.2013.03.008Liang, Z., Song, M.: Time-consistent reinsurance and investment strategies for mean-variance insurer under partial information. 65, 66–76 (2015). MR3430397. https://doi.org/10.1016/j.insmatheco.2015.08.008Nguyen, S.L., Yin, G., Nguyen, D.T.: A general stochastic maximum principle for mean-field controls with regime switching. 84, 3255–3294 (2021). MR4308229. https://doi.org/10.1007/s00245-021-09747-xØksendal, B., Sulem, A.: , 2nd edn. Springer, New York (2007). MR2322248. https://doi.org/10.1007/978-3-540-69826-5Pham, H.: Linear quadratic optimal control of conditional McKean-Vlasov equation with random coefficients and applications. 1, 7 (2016). MR3583182. https://doi.org/10.1186/s41546-016-0008-xPeng, S.: A general stochastic maximum principle for optimal control problems. 28, 966–979 (1990). MR1051633. https://doi.org/10.1137/0328054Phelps, E.S., Pollak, R.A.: On second-best national saving and game-equilibrium growth. 35, 185–199 (1968). https://doi.org/10.2307/2296547Pollak, R.: Consistent planning. 35, 185–199 (1968)Rong, S.: . Springer, New York (2006). MR2160585Shen, Y., Siu, T.K.: The maximum principle for a jump-diffusion mean-field model and its application to the mean-variance problem. 86, 58–73 (2013). MR3053556. https://doi.org/10.1016/j.na.2013.02.029Shi, J., Wu, Z.: Backward stochastic differential equations with Markov switching driven by Brownian motion and Poisson random measure. 87(1), 1–29 (2015). MR3306809. https://doi.org/10.1080/17442508.2014.914514Strotz, R.: Myopia and inconsistency in dynamic utility maximization. 23, 165–180 (1955). https://doi.org/10.2307/2295722Song, Y., Tang, S., Wu, Z.: The maximum principle for progressive optimal stochastic control problems with random jumps. 58(4), 2171–2187 (2020). MR4127097. https://doi.org/10.1137/19M1292308Sun, Z., Guo, X.: Equilibrium for a time-inconsistent stochastic linear-quadratic control system with jumps and its application to the mean-variance problem. 181(2), 383–410 (2019). MR3938474. https://doi.org/10.1007/s10957-018-01471-xTang, S., Li, X.: Necessary conditions for optimal control of stochastic systems with random jumps. 32(5), 1447–1475 (1994). MR1288257. https://doi.org/10.1137/S0363012992233858Wang, T.: Uniqueness of equilibrium strategies in dynamic mean-variance problems with random coefficients. 490(1), 124199 (2020). MR4099907. https://doi.org/10.1016/j.jmaa.2020.124199Wang, H., Wu, Z.: Partially observed time-inconsistency recursive optimization problem and application. 161(2), 664–687 (2014). MR3193813. https://doi.org/10.1007/s10957-013-0326-4Wei, J., Wong, K.C., Yam, S.C.P., Yung, S.P.: Markowitz’s mean-variance asset-liability management with regime switching: a time-consistent approach. 53(1), 281–291 (2013). MR3081480. https://doi.org/10.1016/j.insmatheco.2013.05.008Wu, Z., Wang, X.: FBSDE with Poisson process and its application to linear quadratic stochastic optimal control problem with random jumps. 29, 821–826 (2003). MR2033363Yang, B.Z., He, X.J., Zhu, S.P.: Continuous time mean-variance-utility portfolio problem and its equilibrium strategy. Optimization. MR3175527. https://doi.org/10.1080/02331934.2021.1939339Yong, J.: A deterministic linear quadratic time-inconsistent optimal control problem. 1, 83–118 (2011). MR2822686. https://doi.org/10.3934/mcrf.2011.1.83Yong, J.: Linear quadratic optimal control problems for mean-field stochastic differential equations: time-consistent solutions. 51(4), 2809–2838 (2013). MR3072755. https://doi.org/10.1137/120892477Yong, J.: Time-inconsistent optimal control problems and the equilibrium HJB equation. 2(3), 271–329 (2012). MR2991570. https://doi.org/10.3934/mcrf.2012.2.271Yong, J., Zhou, X.Y.: . Springer, New York (1999). MR1696772. https://doi.org/10.1007/978-1-4612-1466-3Zhang, X., Sun, Z., Xiong, J.: A general stochastic maximum principle for a Markov regime switching jump-diffusion model of mean-field type. 56(4), 2563–2592 (2018). MR3828847. https://doi.org/10.1137/17M112395XZhao, Q., Shen, Y., Wei, J.: Consumption-investment strategies with non-exponential discounting and logarithmic utility. 238(3), 824–835 (2014). MR3214861. https://doi.org/10.1016/j.ejor.2014.04.034Zeng, Y., Li, Z.: Optimal time-consistent investment and reinsurance policies for mean-variance insurers. 49, 145–154 (2011). MR2811903. https://doi.org/10.1016/j.insmatheco.2011.01.001Zeng, Y., Li, Z., Lai, Y.: Time-consistent investment and reinsurance strategies for mean-variance insurers with jumps. 52(3), 498–507 (2013). MR3054742. https://doi.org/10.1016/j.insmatheco.2013.02.007Zhou, X.Y., Li, D.: Continuous-time mean-variance portfolio selection: a stochastic LQ framework. 42, 19–33 (2000). MR1751306. https://doi.org/10.1007/s002450010003Zhou, X.Y., Yin, G.: Markowitzs mean-variance portfolio selection with regime switching: a continuous-time model. 42, 1466–1482 (2003). MR2044805. https://doi.org/10.1137/S0363012902405583