Introduction

VMSTA

Modern Stochastics: Theory and Applications

2351-60542351-6046

2351-6046

VTeX

Mokslininkų g. 2A, 08412 Vilnius, Lithuania

VMSTA199

10.15559/22-VMSTA199

Research Article

Conditional LQ time-inconsistent Markov-switching stochastic optimal control problem for diffusion with jumps

Bouaicha

Nour El Houda

Houda.math@yahoo.fra

https://orcid.org/0000-0003-2134-7989

Chighoub

Farid

f.chighoub@univ-biskra.dza∗ Alia

Ishak

ishak.alia@hotmail.comb Sohail

Ayesha

sohail@ciitlahore.edu.pkc aLaboratory of Applied Mathematics, Biskra University, P.O. Box 145, Biskra (07000), Algeria bDepartment of Mathematics, University of Bordj Bou Arreridj, Bordj Bou Arreridj 34000, Algeria cDepartment of Mathematics, Comsats Institute of Information Technology, Lahore, Pakistan

∗Corresponding author.

2022

322022

92157205121020219120221212022

2022

Open access article under the CC BY license.

The paper presents a characterization of equilibrium in a game-theoretic description of discounting conditional stochastic linear-quadratic (LQ for short) optimal control problem, in which the controlled state process evolves according to a multidimensional linear stochastic differential equation, when the noise is driven by a Poisson process and an independent Brownian motion under the effect of a Markovian regime-switching. The running and the terminal costs in the objective functional are explicitly dependent on several quadratic terms of the conditional expectation of the state process as well as on a nonexponential discount function, which create the time-inconsistency of the considered model. Open-loop Nash equilibrium controls are described through some necessary and sufficient equilibrium conditions. A state feedback equilibrium strategy is achieved via certain differential-difference system of ODEs. As an application, we study an investment–consumption and equilibrium reinsurance/new business strategies for mean-variance utility for insurers when the risk aversion is a function of current wealth level. The financial market consists of one riskless asset and one risky asset whose price process is modeled by geometric Lévy processes and the surplus of the insurers is assumed to follow a jump-diffusion model, where the values of parameters change according to continuous-time Markov chain. A numerical example is provided to demonstrate the efficacy of theoretical results.

Keywords Stochastic maximum principle time-inconsistency LQ control problem equilibrium control variational inequality

2010 MSC 93E20 60H30 93E99 60H10

1 Introduction

For usual optimal control problems, by the dynamic principle of optimality [40] one may check that an optimal control remains optimal when it is restricted to a later time interval, meaning that optimal controls are time-consistent. The time-consistency feature provides a powerful advance to deal with optimal control problems. The dynamic principle of optimality consists in establishing relationships among a family of time-consistent optimal control problems parameterized by initial pairs (of time and state) through the so-called Hamilton–Jacobi–Bellman equation (HJB), which is a nonlinear partial differential equation. If the HJB equation is solvable, then one can find an optimal feedback control by taking the optimizer of the general Hamiltonian involved in the HJB equation.

However, in reality, the time-consistency can be lost in various ways, meaning that, as time goes, an optimal control might not remain optimal. Among several possible reasons causing the time-inconsistency, there are three ones playing some important roles:

•

the appearance of conditional expectations for the state data in the objective functional [3],

•

the presence of a state-dependent risk aversion in the objective functional [4],

•

the nonexponential discounting situation [16].

The portfolio optimization problem with a hyperbolic discount function [11] and the risk aversion attitude in mean-variance models [17, 43] and [44] are two well-known cases of time-inconsistency in mathematical finance. Motivated by the second example, the present paper studies a general linear-quadratic optimal control problem for jump diffusions, which is time-inconsistent in the sense that it does not satisfy the Bellman optimality principle due to the existence of some quadratic terms in the expected controlled state process as well as a state-dependent risk aversion term in the running and the terminal cost functionals. The fundamental challenge when dealing with a time-inconsistent optimal control models is that we can’t employ the dynamic programming approach and the standard HJB equation, in general. One way to get around the time-inconsistency issue is to consider only precommitted strategies, see, e.g., [45] and [26].

However, the main method of dealing with time-inconsistency is to consider the time-inconsistent problems as noncooperative games, in which decisions at every moment of time are taken by multiple players at each moment of time and are intended to maximize or minimize their own objective functions. As a result, Nash equilibriums are considered rather than optimal solutions, see, e.g., [3, 8, 11, 15, 16, 23, 24, 28, 37, 38] and [39]. Strotz [28] was the first who applied this game perspective for dealing with the dynamic time-inconsistent decision problem posed by the deterministic Ramsay problem. He then proposed a rudimentary notion of Nash equilibrium strategy by capturing the concept of noncommitment and allowing the commitment period to be infinitesimally small. Further references which extend [28] are [16, 24] and [13]. Ekeland and Pirvu [11] gave a formal definition of feedback Nash equilibrium controls in a continuous-time setting in order to investigate the optimal investment–consumption problem under general discount functions in both deterministic and stochastic frameworks. Björk & Murguci [3] and Ekeland et al. [10] are two further expansions of Ekeland and Pirvu’s work. Yong [39] proposed an alternative method for analyzing general discounting time-inconsistent optimal control problem in continuous-time setting by taking into account a discrete time counterpart. Zhao et al. [42] investigated the consumption–investment problem under a general discount function and a logarithmic utility function using Yong’s method. Wang and Wu investigated a partially observed time-inconsistent recursive optimization issue in [33]. Basak and Chabakauri [1] touched upon the continuous-time Markowitz’s mean-variance portfolio selection problem, while Björk et al. [4] addressed the mean-variance portfolio selection with state-dependent risk aversion. Hu et al. [15], followed by Czichowski [8], found a time-consistent strategy for mean-variance portfolio selection in a non-Markovian framework.

The linear-quadratic optimal control problems are well known as a fundamental category of optimal control problems, since they may cover a wide range of problems in applications, such as the mean-variance portfolio selection model in financial applications. Furthermore, the LQ model may be used to approximate many nonlinear control problems. In recent years, time-inconsistent LQ control problems have gotten a lot of attention. Yong worked on a general discounted time-inconsistent deterministic LQ model in [37] and he consider a forward ordinary differential equation coupled with a backward Riccati–Volterra integral equation to obtain closed-loop equilibrium strategies. Hu et al. [15] presented a specific definition of open-loop Nash equilibrium controls in a continuous-time setting, which is distinct from that for the feedback controls provided in [11], in order to analyze a time-inconsistent stochastic linear-quadratic optimal control problem with stochastic coefficients. Yong [39] studied a time-inconsistent stochastic LQ problem for mean-field type stochastic differential equation. Finally, Hu et al. [14] looked into the uniqueness of the equilibrium solution found in [15]. They are the first who give a positive result regarding the uniqueness of the solution to a time-inconsistent problem.

There is little work in the literature concerning equilibrium strategies for optimal investment and reinsurance problems under the mean-variance criterion. Zeng and Li [43] are the first who study Nash equilibrium strategies for mean-variance insurers with constant risk aversion, where the surplus process of insurers is described by the diffusion model and the price processes of the risky stocks are driven by geometric Brownian motions. They have obtained equilibrium reinsurance and investment strategies explicitly using the technique described in [3]. Li and Li [17] obtained equilibrium strategies in the case of state-dependent risk aversion through a set of well-posed integral equations. Zeng et al. [44] investigate time-consistent investment and reinsurance strategies for mean-variance insurers under constant risk aversion, in which the surplus process and the price process of the risky stock are both jump-diffusion processes.

Markov regime-switching models have recently gotten a lot of interest in financial applications; see, for example, [46, 5, 6, 34] and [18]. Markov regime-switching models permit the market to face financial crises at any moment. The market is supposed to be governed by some kind of regime at any given moment. A bull market, in which stock prices are generally increasing, is a standard illustration of such a regime. The market’s behavior radically alters after a financial crisis. A switch in the regime symbolizes the crisis. The problem of mean-variance optimization under a continuous-time Markov regime-switching financial market was first studied by Zhou and Yin [46]. By applying stochastic linear-quadratic control methods, they obtained mean-variance efficient portfolios and efficient frontiers via solving two systems of ordinary linear differential equations. In the context of continuous and multiperiod time models, Chen et al. [5] and Chen and Yang [6] studied the mean-variance asset-liability management problem, respectively. Mean-variance asset-liability management problems with a continuous-time Markov regime-switching setup have been studied by Wei et al. [34]. They explicitly deduced a time-consistent investment strategy using the method described in [3]. Liang and Song [18] investigated optimal investment and reinsurance problems for insurers with mean-variance utility under partial information, where the stock’s drift rate and the risk aversion of the insurer are both Markov-modulated.

In this work, we present a general time-inconsistent stochastic conditional LQ control problem. Differently from most current studies [15, 39, 2, 42], where the noise is driven by a Brownian motion, in our LQ system the state develops according to a SDE, in which the noise is driven by a multidimensional Brownian motion and an independent multidimensional Poisson point process under a Markov regime-switching setup. Cases of continuous-time mean-variance criteria with state-dependent risk aversion are included in the objective function. We establish a stochastic system that describes open-loop Nash equilibrium controls, using the variational technique proposed by Hu et al. [14]. We emphasize that our model generalizes the ones investigated by Zeng and Li [43], Li et al. [17], Sun and Guo [30] and Zeng et al. [44], in addition to some classes of time-inconsistent stochastic LQ optimal control problems introduced in [15].

The paper is organized as follows: in the second section, we formulate the problem and provide essential notations and preliminaries. Section 3 is dedicated to presenting the necessary and sufficient conditions for equilibrium, which is our main result, and we get the unique equilibrium control in state feedback representation through a specific category of ordinary differential equations. In the last section, we apply the results of Section 3 to find the unique equilibrium reinsurance, investment and consumption strategies for the mean-variance-utility portfolio problem, as well as discuss some special cases. The paper concludes with an Appendix that includes some proofs.

2 Problem setting

Let ( Ω , F , F , P ) be a filtered probability space where F : = F t t ∈ [ 0 , T ] is a right-continuous, P-completed filtration to which all of the processes outlined below are adapted, such as the Markov chain, the Brownian motions, and the Poisson random measures.

During the present paper, we assume that the Markov chain α · takes values in finite state space χ = e 1 , e 2 , … , e d where d ∈ N, e i ∈ R d and the j-th component of e i is the Kronecker delta δ i j for each i , j ∈ 1 , … , d 2 . H : = λ i j 1 ≤ i , j ≤ d represents the rate matrix of the Markov chain under P. Note that λ i j is the constant transition intensity of the chain from state e i to state e j at time t, for each i , j ∈ 1 , … , d 2 . As a result, for i ≠ j, we have λ i j ≥ 0 and ∑ j = 1 d λ i j = 0, thus λ i i ≤ 0. In the sequel, for each i , j = 1 , 2 , … , d with i ≠ j, we assume that λ i j > 00$]]> consequently, λ i i < 0. We have the following semimartingale representation of the Markov chain α · obtained from Elliott et al. [12] α t = α 0 + ∫ 0 t H ⊤ α ( τ ) d τ + M ( t ) , where { M ( t ) | t ∈ [ 0 , T ] } is an R d -valued ( F , P )-martingale.

First, we provide a set of Markov jump martingales linked with the chain α · , which will be used to model the controlled state process. For each i , j ∈ 1 , … , d 2 , with i ≠ j, and t ∈ 0 , T , denote by J i j t : = λ i j ∫ 0 t α τ − , e i d τ + m i j ( t ) the number of jumps from state e i to state e j up to time t, where m i j ( t ) : = ∫ 0 t α τ − , e i d M τ , e j d τ is an ( F , P )-martingale. Φ j ( t ) denotes the number of jumps into state e j up to time t, for each fixed j = 1 , 2 , … , d, then Φ j ( t ) = ∑ i = 1 , i ≠ j d J i j t , = ∑ i = 1 , i ≠ j d λ i j ∫ 0 t α τ − , e i d τ + Φ ˜ j ( t ) , where Φ ˜ j ( t ) : = ∑ i = 1 , i ≠ j d m i j ( t ) is an F , P -martingale for each j = 1 , 2 , … , d. For each j = 1 , 2 , … , d set λ j ( t ) = ∑ i = 1 , i ≠ j d λ i j ∫ 0 t α τ , e i d τ .

Note that the process Φ ˜ j ( t ) = Φ j ( t ) − λ j ( t ) is an F , P -martingale, for each j = 1 , 2 , … , d.

Now, we present the Markov regime-switching Poisson random measures. Assume that N i ( d t , d z ), i = 1 , 2 , … , l, are independent Poisson random measures on 0 , T × R 0 , B 0 , T ⊗ B 0 under P. Assume that the compensator for the Poisson random measure N i ( d t , d z ) is defined by n α i ( d t , d z ) : = θ α t − i ( d z ) d t = α t − , θ i ( d z ) d t , where θ i ( d z ) : = θ e 1 i ( d z ) , θ e 2 i ( d z ) , … , θ e d i ( d z ) ⊤ ∈ R d . The subscript α in n α i , for i = 1 , 2 , … , l, represents the dependence of the probability law of the Poisson random measure on the Markov chain α · . In fact θ e j i ( d z ) is the conditional Lévy density of jump sizes of the random measure N i ( d t , d z ) at time t when α t − = e j , for each j = 1 , 2 , … , d. Furthermore, the compensated Poisson random measure N ˜ α ( d t , d z ) is given by N ˜ α ( d t , d z ) = N 1 ( d t , d z ) − n α 1 ( d t , d z ) , … , N l ( d t , d z ) − n α l ( d t , d z ) ⊤ .

2.1 Notations

Throughout this paper, we use the following notations: S n is the set of n × n symmetric real matrices. C ⊤ is the transpose of the vector (or matrix) C. · , · is the inner product in some Euclidean space. For any Euclidean space H = R n , or S n with Frobenius norm · , and p , l , d ∈ N we denote for any t ∈ 0 , T : •

L p Ω , F t , P ; H = ξ : Ω → H | ξ is F t -measurable , s.t. E ξ p < ∞ , for any p ≥ 1;

•

L 2 R ∗ , B R ∗ , θ ; H l = { r · : R ∗ → H l | r · = r k · k = 1 , 2 , … , l is B R ∗ -measurable with ∑ k = 1 l ∫ R ∗ r k z 2 θ α k d z d s < ∞ };

•

S F 2 t , T ; H = { Y · : t , T × Ω → H | Y · is F s s ∈ t , T -adapted , s ↦ Y ( s ) is càdlàg , with E sup s ∈ t , T Y s 2 d s < ∞ };

•

L F 2 t , T ; H p = { Y · : t , T × Ω → H p | Y · is F s s ∈ t , T -adapted , with E ∫ t T Y s 2 d s < ∞ };

•

L F , p 2 t , T ; H = { Y · : t , T × Ω → H | Y · is F s s ∈ t , T -predictable , with E ∫ t T Y s 2 d s < ∞ };

•

L F , p θ , 2 t , T × R ∗ ; H l = { R · , · : t , T × Ω × R ∗ → H l | R · , · is F s s ∈ t , T -predictable, with ∑ k = 1 l E ∫ t T ∫ R ∗ R k s , z 2 θ α k d z d s < ∞ };

•

L F , p λ , 2 t , T ; H d = { Y · : t , T × Ω → H d | Y · = Y j · j = 1 , … , d is F s s ∈ t , T -predictable, with E [ ∫ t T ∑ j = 1 d Y j s 2 λ j s d s ] < ∞ };

•

C 0 , T ; H = f : 0 , T → H | f · is continuous ;

•

C 1 0 , T ; H = f : 0 , T → H | f · and d f d s · are continuous ;

•

D 0 , T = t , s ∈ 0 , T × 0 , T such that s ≥ t .

2.2 Assumptions and problem formulation

Throughout this paper, we consider a multidimensional nonhomogeneous linear controlled jump-diffusion system starting from the situation t , ξ , e i ∈ 0 , T × L 2 ( Ω , F t α , P ; R n ) × χ, defined by (2.1) d X s = A s , α s X s + B s , α s u s + b s , α s d s d X s = + ∑ i = 1 p C i s , α s X s + D i s , α s u s + σ i s , α s d W i s d X s = + ∑ k = 1 l ∫ R ∗ E k s , z , α s X s − + F k s , z , α s u s d X s = + ∑ k = 1 l ∫ R ∗ + c k s , z , α s N ˜ α k d s , d z , s ∈ t , T , X t = ξ , α t = e i .

The coefficients A · , · , C i · , · : 0 , T × χ → R n × n ; B · , · , D i · , · : 0 , T × χ → R n × m ; b · , · , σ i · , · : 0 , T × χ → R n ; E k · , · , · : 0 , T × R ∗ × χ → R n × n ; F k · , · , · : 0 , T × R ∗ × χ → R n × m ; c k · , · , · : 0 , T × R ∗ × χ → R n are deterministic matrix-valued functions. Here, for any t ∈ 0 , T , the class of admissible control processes over t , T is restricted to L F , p 2 t , T ; R m . For any u · ∈ L F , p 2 t , T ; R m we denote by X · = X t , ξ , e i · ; u · its solution. Different controls u · will lead to different solutions X · .

Remark 1.

In practice, the observable switching process is followed to represent the interest rate processes over various market settings. For example, the market may be generally split into “bullish” and “bearish” states, with characteristics varying greatly between the two modes. The application of switching model in mathematical finance can be discovered, for example, in [5, 6] and references therein.

To measure the performance of u · ∈ L F , p 2 t , T ; R m , we introduce the following cost functional (2.2) J t , ξ , e i ; u · = E ∫ t T 1 2 Q s X s , X s + Q ¯ s E X s F s α , E X s F s α + R t , s u s , u s d s + μ 1 ξ + μ 2 , X T + 1 2 G X T , X T + 1 2 G ¯ E X T F T α , E X T F T α .

Remark 2.

Due to the general influence of the modulating switching process α ( · ), the conditional expectation is employed rather than the expectation in (2.2). The presence of α ( · ) in all coefficients of the state equation (2.1) can be makes the objective functional depends on the process’s history. This type of cost functional is also motivated by practical problems such as conditional mean-variance portfolio selection problem which is considered in Section 4 of this paper. A reader interested in this type of problems is referred to [21] and [19]. The term μ 1 ξ + μ 2 , X T stems from a state-dependent utility function in economics [4].

We need to impose the following assumptions on the coefficients.

(H1)

The functions A · , · , B · , · , b · , · , C i · , · , D i · , · , σ i · , · , E k · , · , · , F k · , · , · and c k · , · , · are deterministic, continuous and uniformly bounded. The coefficients of the cost functional satisfy Q · , Q ¯ · ∈ C 0 , T ; S n , R · , · ∈ C D 0 , T ; S m , G , G ¯ ∈ S n , μ 1 ∈ R n × n , μ 2 ∈ R n .

(H2)

The functions R · , · , Q · and G satisfy R t , t ≥ 0, Q t ≥ 0, ∀ t ∈ 0 , T and G ≥ 0.

Based on [25] we can prove under (H1) that, for any t , ξ , e i , u · ∈ 0 , T × L 2 Ω , F t α , P ; R n × X × L F , p 2 t , T ; R m , the state equation (2.2) has a unique solution X · ∈ S F 2 t , T ; R n . Moreover, we have the estimate (2.3) E sup t ≤ s ≤ T X s 2 ≤ K 1 + E ξ 2 , for some positive constant K. In particular for t = 0 and u · ∈ L F , p 2 0 , T ; R m , Equation (2.1) starting from initial data 0 , x 0 has a unique solution X · ∈ S F 2 ( 0 , T ; R n ) for which (2.4) E sup 0 ≤ s ≤ T X s 2 ≤ K 1 + x 0 2 .

Our optimal control problem can be formulated as follows.

Problem (N).

For any initial triple t , ξ , e i ∈ 0 , T × L 2 Ω , F t α , P ; R n × χ, find a control u ˆ · ∈ L F , p 2 t , T ; R m such that J t , ξ , e i ; u ˆ · = min u . ∈ L F , p 2 t , T ; R m J t , ξ , e i ; u · .

Any u ˆ · ∈ L F , p 2 t , T ; R m satisfying the above is called a pre-commitment optimal control. Furthermore, the presence of some quadratic terms of the conditional expectation of the state process as well as a state-dependent term in the objective functional destroys the time-consistency of a pre-committed optimal solutions of Problem (N). Hence, Problem (N) is time-inconsistent and there are two different sources of time-inconsistency.

3 The main results: characterization and uniqueness of equilibrium

In view of the fact that Problem (N) is time-inconsistent, the aim of this paper is to characterize open-loop Nash equilibriums as an alternative of optimal strategies. We employ the game theoretic approach to handle the time-inconsistency in the identical viewpoint as Ekeland et al. [11] and Björk and Murgoci [3]. Let us briefly explain the game perspective that we will consider as follows.

•

We consider a game with one player at every point t in the interval [ 0 , T ). This player corresponds to the incarnation of the controller on instant t and is referred to as “player t”.

•

The t-th player can control the scheme just at time t by taking his/her policy u t , · : Ω → R m .

•

A control process u ( · ) is then viewed as a complete explanation of the selected strategies of all players in the game.

•

The reward to the player t is specified by the functional J t , ξ , e i ; u · .

We explain the concept of a “Nash equilibrium strategy” for the game described as above: This is an admissible control process u ˆ · fulfilling the following criteria. Assume that every player s, with s > tt$]]>, will apply the strategy u ˆ s . Then the optimal decision for player t is that he/she also uses the strategy u ˆ t . However, the difficulty with this “definition” is that the individual player t does not have any effect on the game’s result. He/she just selects the control at one point t. Furthermore, because this is a time set of Lebesgue measure zero, the control dynamics will be unaffected.

As a result, to identify open-loop Nash equilibrium controls, we follow [15], where a formal definition (Definition 4 below), inspired by [11], is proposed.

Remark 3.

In the rest of the paper, for brevity, we suppress the arguments s , α s for the coefficients A s , α s , B s , α s , b s , α s , C i s , α s , D i s , α s , σ i s , α s , in addition we suppress the arguments s and s , t for the coefficients Q s , Q ¯ s , R s , t and we use the notation ϱ z instead of ϱ s , z , α s for ϱ = E k , F k and c k . Furthermore, sometimes we simply call u ˆ · an equilibrium control instead of calling it an open-loop Nash equilibrium control, when there is no confusion.

In this section, we provide the main results about the necessary and sufficient conditions for equilibrium of the control problem formulated in the preceding section. To make the presentation of the paper more clear, the proofs will be relegated to Appendix A. To proceed towards the definition of an equilibrium, we first introduce the local spike variation for a given admissible control u ˆ · ∈ L F , p 2 t , T ; R m : for any t ∈ 0 , T , v ∈ L 2 Ω , F t − α , P ; R m and ε ∈ 0 , T − t , define (3.1) u ε s = u ˆ s + v , for s ∈ t , t + ε , u ˆ s , for s ∈ t + ε , T . We have the following definition. Definition 4 (Open-loop Nash equilibrium).

An admissible control u ˆ · ∈ L F , p 2 ( t , T ; R m ) is an open-loop Nash equilibrium control for Problem (N) if for every sequence ε n ↓ 0, we have (3.2) lim ε n ↓ 0 1 ε n J t , X ˆ t , α t ; u ε n · − J t , X ˆ t , α t ; u ˆ · ≥ 0 , for any t ∈ 0 , T and v ∈ L 2 Ω , F t − α , P ; R m . The corresponding equilibrium dynamics solves the following SDE with jumps: for s ∈ 0 , T , d X ˆ s = A X ˆ s + B u ˆ s + b d s d X ˆ s = + ∑ i = 1 p C i X ˆ s + D i u ˆ s + σ i d W i s d X ˆ s = + ∑ k = 1 l ∫ R ∗ E k z X ˆ s − + F k z u ˆ s + c k z N ˜ α k d s , d z , X ˆ 0 = x 0 , α 0 = e i 0 .

3.1 Flow of the adjoint equations and characterization of equilibrium controls

In this subsection, we provide a general necessary and sufficient conditions to characterize the equilibrium strategies of Problem (N). First, we consider the adjoint equations used within the characterization of equilibrium controls. Let u ˆ · ∈ L F , p 2 ( t , T ; R m ) be a fixed control and denote by X ˆ · ∈ S F 2 0 , T ; R n its corresponding state process. For each t ∈ 0 , T , the first order adjoint equation defined on the time interval t , T and satisfied by the 4-tuple of processes p · ; t , q · ; t , r · , · ; t , l · ; t is given as follows: (3.3) d p s ; t = − A ⊤ p s ; t + ∑ i = 1 p C i ⊤ q i s ; t d p s ; t = + ∑ k = 1 l ∫ R ∗ E k z ⊤ r k s , z ; t θ α k d z − Q X ˆ s − Q ¯ E X ˆ s F s α d s d p s ; t = + ∑ i = 1 p q i s ; t d W i s + ∑ k = 1 l ∫ R ∗ r k s , z ; t N ˜ α k d s , d z d p s ; t = + ∑ j = 1 d l j s , t d Φ ˜ j s , s ∈ t , T , p T ; t = − G X ˆ T − G ¯ E X ˆ T F T α − μ 1 X ˆ t − μ 2 .

Through this section, we will prove that we can get the equilibrium strategy by solving a system of FBSDEs which is not standard since the flow of the unknown process p · ; t , q · ; t , r · , · ; t , l · ; t for t ∈ [ 0 , T ] is involved. To the best of our knowledge, the ability to explicitely solve this type of equation remains an open problem, except for a certain form of the objective function. However, by the separating variables approach we are able to completely solve this problem.

Lemma 5.

Consider a deterministic matrix-valued function ϕ · , · as a solution of the following ODE d ϕ s , α s = ϕ s , α s A ⊤ d s , s ∈ 0 , T , ϕ T , e i = I n .

For any t ∈ 0 , T and s ∈ t , T , the solution of Equation (3.3) have the representation p s ; t = − ϕ s , α s − 1 p ¯ s + G ¯ E X ˆ T F T α + μ 1 X ˆ t + μ 2 − ϕ s , α s − 1 ∫ s T ϕ τ , α τ Q ¯ E X ˆ τ F τ α d τ , and q i s ; t , r k s , z ; t , l j s ; t = − ϕ s , α s − 1 q ¯ i s , r ¯ k s , z , l ¯ j s for i = 1 , 2 , … , p; k = 1 , 2 , … , l; j = 1 , 2 , … , d, where (3.4) d p ¯ s = − ∑ i = 1 p ϕ s , α s C i ⊤ ϕ s , α s − 1 q ¯ i s d p ¯ s = + ∑ k = 1 l ∫ R ∗ ϕ s , α s E k z ⊤ ϕ s , α s − 1 r ¯ k s , z θ α k d z d p ¯ s = + ϕ s , α s Q X ˆ s d s + ∑ i = 1 p q ¯ i s d W i s d p ¯ s = + ∑ k = 1 l ∫ R ∗ r ¯ k s − , z N ˜ α k d s , d z + ∑ j = 1 d l ¯ j s d Φ ˜ j s , s ∈ t , T , p ¯ T = G X ˆ T .

Remark 6.

(1)

We remark that neither the coefficients nor the terminal condition of (3.4) are affected by the starting time t, so it may be considered as a standard BSDE over the entire time period [ 0 , T ], then, by the same manner of [27] we can verify that Equation (3.4) admits a unique solution.

(2)

From the representation of p · ; t , q · ; t , r · , · ; t , l · ; t , for t ∈ 0 , T given by Lemma 5, we can check that under (H1) Equation (3.3) admits a unique solution p · ; t , q · ; t , r · , · ; t , l · ; t ∈ S F 2 t , T ; R n × L 2 t , T ; R n p × L F , p θ , 2 t , T × R ∗ ; R n l × L F , p λ , 2 t , T ; R n d .

The following second order adjoint equation is defined on the time interval t , T and satisfied by the 4-tuple of processes P · , Λ · , Γ · ; · , L · : (3.5) d P s = − A ⊤ P s + P s A + ∑ i = 1 p C i ⊤ P s C i + Λ i s C i + C i ⊤ Λ i s d P s = + ∑ k = 1 l ∫ R ∗ Γ k s , z E k z θ α k d z + E k z ⊤ Γ k s , z θ α k d z d P s = + ∑ k = 1 l ∫ R ∗ E k z ⊤ Γ k s , z + P s E k z θ α k d z − Q d s d P s = + ∑ i = 1 p Λ i s d W i s + ∑ k = 1 l ∫ R ∗ Γ k s , z N ˜ α k d s , d z d P s = + ∑ j = 1 d L j s d Φ ˜ j s , s ∈ t , T , P T = − G .

Noting that (3.5) is a standard BSDE over the entire time period [ 0 , T ], by the same manner of [27], we can verify that Equation (3.5) admits a unique solution P · , Λ · , Γ · ; · , L · ∈ S F 2 t , T ; S n × L 2 t , T ; S n p × L F , p θ , 2 t , T × R ∗ ; S n l × L F , p λ , 2 t , T ; S n d .

Now, associated with u ˆ · , X ˆ · , p · ; · , q · ; · , r · , · ; · , P · , Γ · ; · we define, for s , t ∈ D 0 , T , (3.6) U s ; t = B ⊤ p s ; t + ∑ i = 1 p D i ⊤ q i s ; t + ∑ k = 1 l ∫ R ∗ F k z ⊤ r k s , z ; t θ α k d z − R u ˆ s , and (3.7) V s ; t = ∑ i = 1 p D i ⊤ P s D i + ∑ k = 1 l ∫ R ∗ F k z ⊤ P s + Γ s , z F k z θ α k d z − R .

Remark 7.

Definition 4 is slightly different from the original definition provided by [15] and [14], where the open-loop equilibrium control is given by (3.8) lim ε ↓ 0 1 ε J t , X ˆ t , α t ; u ε · − J t , X ˆ t , α t ; u ˆ · ≥ 0 . Although the limit (3.8) already provides a characterizing condition, however, it is not very useful because it involves an a.s. limit with respect to uncountably many ε > 00$]]>. Thus, in this case by using the property of RCLL of state process X ( · ) we can deduce an equivalent condition for the equilibrium, see Hu et al. [15]. In this paper, we defined an open-loop equilibrium control by sense (3.2), which is well defined in general.

The following lemma will be used later in this study, it provides some important property about the flow of adapted processes.

Lemma 8.

Under assumptions (H1)–(H2), for any u ˆ · ∈ L F , p 2 t , T ; R m , there exists a sequence ε n t n ∈ N ⊂ ( 0 , T − t ) satisfying ε n t → 0 as n → ∞, such that (3.9) lim n → ∞ 1 ε n t ∫ t t + ε n t E U s ; t d s = U t ; t , d P -a.s , d t -a.e.

Now we introduce the space (3.10) L = Λ · ; t ∈ S F 2 t , T ; R n such that sup t ∈ 0 , T E sup s ∈ t , T Λ s ; t 2 < + ∞ . Clearly, for any u ˆ · ∈ L F , p 2 0 , T ; R m , its associated flow of adjoint processes p · ; · ∈ L.

The following theorem is the first main result of this work, it provides a necessary and sufficient conditions for equilibrium controls to the time-inconsistent Problem (N).

Theorem 9 (Characterization of equilibrium).

Let (H1) hold. Given an admissible control u ˆ · ∈ L F , p 2 0 , T ; R m , let p · ; · , q · ; t , r · , · ; t , l · ; t ∈ L × L F 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d , be the unique solution to the BSDE (3.3) and let P · , Λ · , Γ · , · , L · ∈ S F 2 t , T ; S n × L 2 t , T ; S n p × L F , p θ , 2 t , T × R ∗ ; S n l × L F , p λ , 2 t , T ; S n d , be the unique solution to the BSDE (3.5). Then u ˆ · is an open-loop Nash equilibrium if and only if the following two conditions hold: The first order equilibrium condition (3.11) U t ; t = 0 , d P -a.s. , d t -a.e. and the second order equilibrium condition (3.12) V t ; t ≤ 0 , d P -a.s. , ∀ t ∈ 0 , T , where U t ; t and V t ; t are given by (3.6) and (3.7), respectively.

In order to give a proof for the above theorem, the main idea is still based on the variational techniques in the spirit of proving the characterization of equilibria [14] and [15] in the absence of random jumps.

Let u ˆ · ∈ L F , p 2 0 , T ; R m be an admissible control and X ˆ · be the corresponding controlled state process. Consider the perturbed control u ε · defined by the spike variation (3.1) for some fixed arbitrary t ∈ 0 , T , v ∈ L 2 Ω , F t − α , P ; R m and ε ∈ 0 , T − t . Denote by X ˆ ε · the solution of the state equation corresponding to u ε · . It follows from the standard perturbation approach, see, for example, [31] and [41], that X ˆ ε · − X ˆ · = y ε , v · + Y ε , v · , where y ε , v · and Y ε , v · solve the following SDEs, respectively, for s ∈ t , T : (3.13) d y ε , v s = A y ε , v s d s + ∑ i = 1 p C i y ε , v s + D i v 1 t , t + ε s d W i s d y ε , v s = + ∑ k = 1 l ∫ R ∗ E k z y ε , v s − + F k z v 1 t , t + ε s N ˜ α k d s , d z , y ε , v t = 0 , (3.14) d Y ε , v s = A Y ε , v s + B v 1 t , t + ε s d s + ∑ i = 1 p C i Y ε , v s d W i s d Y ε , v s = + ∑ k = 1 l ∫ R ∗ E k z Y ε , v s − N ˜ α k d s , d z , Y ε , v t = 0 .

We need the following lemma

Lemma 10.

Under assumption (H1), the following estimates hold: (3.15) sup s ∈ t , T E y ε , v s 2 = O ε , (3.16) sup s ∈ t , T E Y ε , v s 2 = O ε 2 . We have also (3.17) sup s ∈ t , T E y ε , v s F s α 2 = O ε 2 . Moreover, we have the equality (3.18) J t , X ˆ t , α t ; u ε · − J t , X ˆ t , α t ; u ˆ · = − ∫ t t + ε E U s ; t , v + 1 2 V s ; t v , v d s + o ε .

Now, we are ready to give the proof of Theorem 9. Proof of Theorem 9.

Given an admissible control u ˆ · ∈ L F , p 2 0 , T ; R m , for which (3.11) and (3.12) holds, according to Lemma 8, we have from (3.18) that for any t ∈ 0 , T and for any R m -valued, F t α -measurable and bounded random variable v, there exists a sequence ε n t n ∈ N ⊂ ( 0 , T − t ) satisfying ε n t → 0 as n → ∞, such that lim n → 0 1 ε n t J t , X ˆ t , α t ; u ε · − J t , X ˆ t , α t ; u ˆ · = − U t ; t , v + 1 2 V t ; t v , v , = − 1 2 V t ; t v , v , ≥ 0 , d P -a.s.

Hence u ˆ · is an equilibrium strategy.

Conversely, assume that u ˆ · is an equilibrium strategy. Then, by (3.2) together with (3.18) and Lemma 8, for any t , u ∈ 0 , T × R m , the following inequality holds: (3.19) U t ; t , u + 1 2 V t ; t u , u ≤ 0 .

Now, we define ∀ t , u ∈ 0 , T × R m , Φ t , u = U t ; t , u + 1 2 V t ; t u , u . Easy manipulations show that the inequality (3.19) is equivalent to Φ t , 0 = max u ∈ R m Φ t , u , d P -a.s. , ∀ t ∈ 0 , T . So it is easy to prove that the maximum condition is equivalent to the following two conditions: (3.20) Φ u t , 0 = U t ; t = 0 , ∀ t ∈ 0 , T , d P -a.s. , (3.21) Φ u u t , 0 = V t ; t ≤ 0 , ∀ t ∈ 0 , T , d P -a.s.

This completes the proof. □

Remark 11.

It is worth noting that for the positive semidefinite conditions on the coefficients Q · , G and R · , · , the corresponding process P ( · ) in [15] and [14] is indeed positive semidefinite due to the comparison principles of BSDEs. Thus, as a result of Theorem 9, a necessary and sufficient condition for a control being an equilibrium strategy is only the first order equilibrium condition (3.11). However, there is a significant difference between the estimate for the cost functional presented and that in [15] and [14]. Because stochastic coefficients and random jumps of the controlled system are taken into account, an additional term Γ ( · , · ) occurs in the formulation of P ( · ). So in this paper, P ( · ) is not necessarily positive semidefinite. This is why we modify the methodology of deriving the sufficient condition for equilibrium controls. Therefore, we have the following corollary, the proof of which follows the same arguments as the proof of Proposition 3.2 in [30].

Corollary 12.

Let (H1)–(H2) hold. Given an admissible control u ˆ · ∈ L F , p 2 ( 0 , T ; R m ), let p · ; · , q · ; · , r · , · ; · , l · ; · ∈ L × L F 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d be the unique solution to the BSDE (3.3). Then u ˆ · is an equilibrium, if the following condition holds d P -a.s., d t -a.e. (3.22) B ⊤ p t ; t + ∑ i = 1 p D i ⊤ q i t + ∑ k = 1 l ∫ R ∗ F k z ⊤ r k t , z θ α k d z − R u ˆ t = 0 ,

3.2 Linear feedback stochastic equilibrium control

In this subsection, our goal is to obtain a state feedback representation of an equilibrium control for Problem (N) via some class of ordinary differential equations.

Now, suppressing the arguments s , e i for the coefficients A, B, b, C i , D i , σ i , we use the notation ϱ z instead of ϱ s , z , e i for ϱ = E k , F k and c k . First, for any deterministic, differentiable function η ∈ C 0 , T × χ ; R n × n consider the differential-difference operator L η s , · = η ′ s , · + ∑ j = 1 d λ i j η s , e j − η s , · .

Then we introduce the following system of differential-difference equations, for s ∈ 0 , T : (3.23) 0 = L M s , e i + M s , e i A + A ⊤ M s , e i + ∑ i = 1 p C i ⊤ M s , e i C i 0 = − M s , e i B + ∑ i = 1 p C i ⊤ M s , e i D i 0 = + ∑ k = 1 l ∫ R ∗ E k z ⊤ M s , e i F k z θ α k d z Ψ s , e i 0 = + ∑ k = 1 l ∫ R ∗ E k z ⊤ M s , e i E k z θ α k d z + Q , 0 = L M ¯ s , e i + M ¯ s , e i A + A ⊤ M ¯ s , e i − M ¯ s , e i B Ψ s , e i + Q ¯ , 0 = L Υ s , e i + A ⊤ Υ s , e i , 0 = L φ s , e i + A ⊤ φ s , e i + M s , e i + M ¯ s , e i b − B ψ s , e i 0 = + ∑ i = 1 p C i ⊤ M s , e i σ i − D i ψ s , e i 0 = + ∑ k = 1 l ∫ R ∗ E k z ⊤ M s , e i c k z − F k z ψ s , e i θ α k d z , M T , e i = G ; M ¯ T , e i = G ¯ ; Υ T , e i = μ 1 ; φ T , e i = μ 2 , where Ψ · , · and ψ · , · are given by (3.24) Ψ s , e i ≜ Θ s , e i B ⊤ M s , e i + M ¯ s , e i + Υ s , e i Ψ s , e i ≜ + ∑ i = 1 p D i ⊤ M s , e i C i + ∑ k = 1 l ∫ R ∗ F k z ⊤ M s , e i E k z θ α k d z , ψ s , e i ≜ Θ s , e i B ⊤ φ s , e i + ∑ i = 1 p D i ⊤ M s , e i σ i ψ s , e i ≜ + ∑ k = 1 l ∫ R ∗ F k z ⊤ M s , e i c k z θ α k d z , with Θ s , · = R + ∑ i = 1 p D i ⊤ M s , · D i + ∑ k = 1 l ∫ R ∗ F k z ⊤ M s , · F k z θ α k d z − 1 .

The following theorem presents the existence condition for a linear feedback equilibrium control. Theorem 13.

Let (H1)–(H2) hold. Suppose that the system of equations (3.23) admit a solution M · , e i , M ¯ · , e i , Υ · , e i and φ · , e i , for any e i ∈ X, on C ( 0 , T ; R n × n ). Then the time-inconsistent LQ Problem (N) has an equilibrium control that can be represented by the state feedback form (3.25) u ˆ t = − Ψ t , α t X ˆ t − − ψ t , α t , where Ψ · , · and ψ · , · are given by (3.24).

3.3 Uniqueness of the equilibrium control

In this subsection, we prove that if the system of equations (3.23) is solvable, then the state feedback equilibrium control given by (3.25) is the unique open-loop Nash equilibrium control of Problem (N). Theorem 14.

Let (H1)–(H2) hold. Suppose that M · , · , M ¯ · , · , Υ · , · and φ · , · are solutions to the system (3.23). Then u ˆ · given by (3.25) is the unique open-loop Nash equilibrium control for Problem (N).

4 Applications

In this section, we discuss an extension of a new class of optimization problems [36], in which the investor manages her/his wealth by consuming and investing in a financial market subject to a mean variance criterion controlling the final risk of the portfolio. This problem can be eventually formulated as a time-inconsistent stochastic LQ problem and solved by the results presented in the preceding sections.

4.1 Conditional mean-variance-utility consumption–investment and reinsurance problem

We study equilibrium reinsurance (eventually new business), investment and consumption strategies for mean-variance-utility portfolio problem where the surplus of the insurers is assumed to follow a jump-diffusion model. The financial market consists of one riskless asset and one risky asset whose price processes are described by regime-switching SDEs. The problem is formulated as follows. Consider an insurer whose surplus process is described by the jump-diffusion model (4.1) d Λ s = c d s + β 0 d W 1 s − d ∑ i = 1 N α s Y i , s ∈ 0 , T , where c > 00$]]> is the premium rate, β 0 is a positive constant, W 1 is a one-dimensional standard Brownian motion, N α is a Poisson process with intensity λ > 00$]]> and Y i i ∈ N − 0 is a sequence of independent and identically distributed positive random variables with common distribution P Y having finite first and second moments μ Y = ∫ 0 ∞ z P Y d z and σ Y = ∫ 0 ∞ z 2 P Y d z . We assume that W 1 , N α , and ∑ i = 1 N α . Y i are independent. Let Y be a generic random variable which has the same distribution as Y i . The premium rate c is assumed to be calculated via the expected value principle, i.e. c = 1 + η λ μ Y with safety loading η > 00$]]>.

Note that the process ∑ i = 1 N α s Y i can also be defined through a random measure N α 1 d s , d z as ∑ i = 1 N α s Y i = ∫ 0 s ∫ 0 ∞ z N α 1 d r , d z , where N α 1 is a finite Poisson random measure with a random compensator having the form θ α 1 d z d s = λ P Y d z d s. We recall that N ˜ α 1 d s , d z = N α 1 d s , d z − θ α 1 d z d s defines the compensated jump martingale random measure of N α 1 . Obviously, we have ∫ 0 + ∞ z θ α 1 d z d s = λ ∫ 0 + ∞ z P Y d z d s = λ μ Y d s .

Hence (4.1) is equivalent to (4.2) d Λ s = η λ μ Y d s + β 0 d W 1 s − ∫ 0 + ∞ z N ˜ α 1 d s , d z .

Suppose that the insurer is allowed to invest its wealth in a financial market, in which two securities are traded continuously. One of them is a bond with price S 0 s at time s ∈ 0 , T governed by (4.3) d S 0 s = r 0 s , α s S 0 s d s , S 0 0 = s 0 > 0 . 0.\]]]> There is also a risky asset with unit price S 1 s at time s ∈ 0 , T governed by (4.4) d S 1 s = S 1 s − σ s , α s d s + β s , α s d W 2 s + ∫ − 1 + ∞ z N α 2 d s , d z − θ α 2 d z d s , S 1 0 = s 1 > 0 , 0,\end{aligned}\]]]> where r 0 , σ , β : 0 , T × X → 0 , ∞ are assumed to be deterministic and continuous functions such that σ s , α s > r 0 s , α s > 0{r_{0}}\left(s,\alpha \left(s\right)\right)>0$]]>, W 2 · is a one-dimensional standard Brownian motion, N α 2 is a finite Poisson random measure with random compensator having the form n α 2 d s , d z = θ α 2 d z d s. We assume that W 1 · , W 2 · , N α 1 · , · and N α 2 · , · are independent and θ α 2 · is a Lévy measure on − 1 , + ∞ such that ∫ − 1 + ∞ z 2 θ α 2 d z < ∞.

The insurer, starting from an initial capital x 0 > 00$]]> at time 0, is allowed to dynamically purchase proportional reinsurance (acquire new business), invest in the financial market and consuming. A trading strategy u · is described by a three-dimensional stochastic processes u 1 · , u 2 · , u 3 · ⊤ . The strategy u 1 s ≥ 0 represents the retention level of reinsurance or new business acquired at time s ∈ 0 , T . We point that u 1 s ∈ 0 , 1 corresponds to a proportional reinsurance cover and shows that the cedent should divert part of the premium to the reinsurer at the rate of 1 − u 1 t ) θ 0 + 1 λ μ Y , where θ 0 is the relative safety loading of the reinsurer satisfying θ 0 ≥ η. Meanwhile, for each claim Y occurring at time s, the reinsurer pays 1 − u 1 t ) Y of the claim, and the cedent pays the rest. u 1 s ∈ 1 , + ∞ corresponds to acquiring new business. u 2 s ≥ 0 represents the amount invested in the risky stock at time s. The dollar amount invested in the bond at time s is X x 0 , e i 0 , u · s − u 2 s , where X x 0 , e i 0 , u · · is the wealth process associated with strategy u · and the initial states x 0 , e i 0 , u 3 s represents the consumption rate at time s ∈ 0 , T . Thus, incorporating reinsurance/new business, and investment strategies into the surplus process and the risky asset, respectively. As time evolves, we consider the evolution of the controlled stochastic differential equation parametrized by t , ξ , e i ∈ 0 , T × L 2 Ω , F t α , P ; R × χ and satisfied by X · : for s ∈ 0 , T , (4.5) d X s = r 0 s , α s X s + δ + θ 0 u 1 s λ μ Y + r s , α s u 2 s d s d X s = − u 3 s d s + β 0 u 1 s d W 1 s + β s , α s u 2 s d W 2 s d X s = − u 1 s − ∫ 0 + ∞ z N ˜ α 1 d s , d z + u 2 s − ∫ − 1 + ∞ z N ˜ α 2 d s , d z , X t = ξ , α t = e i , where r s , α s = σ s , α s − r 0 s , α s and δ = η − θ 0 . Then, for any t , ξ , e i ∈ 0 , T × L 2 Ω , F t α , P ; R × χ the mean-variance-utility consumption–investment and reinsurance optimization problem is reduced to maximization of the utility function J ( t , ξ , e i ; · ) given by (4.6) J t , ξ , e i ; u · = E ∫ t T 1 2 h s − t u 3 ( s ) 2 d s + 1 2 Var X T F T α − μ 1 ξ + μ 2 E X T F T α , subject to (4.5), where h · : [ 0 , T ] → R is a general deterministic nonexponential discount function satisfying h ( 0 ) = 1, h ( s ) > 00$]]> d s -a.e. and ∫ 0 T h ( s ) d s < ∞. In this paper we consider general discount functions satisfying the above assumptions. Some possible examples of discount functions are considered in the literatures [42] and [10].

Remark 15.

Similar to [19] and [21], due to the presence of the observable random factor α · , we consider the expectation of a conditional mean-variance criterion in the above cost functional. This is different from the mean-variance portfolio selection problem with regime switching considered in [41] and [5]. In [21], a conditional mean-variance portfolio selection problem with common noise is proposed and solved using the linear-quadratic optimal control of the conditional McKean–Vlasov equation with random coefficients and dynamic programming approach.

With n = 1, p = l = m = 3, the optimal control problem associated with (4.5) and (4.6) is equivalent to maximization of (4.7) J t , ξ , e i ; u . = E ∫ t T 1 2 h s − t Γ ⊤ Γ u ( s ) , u s d s + 1 2 Var X T F T α − μ 1 ξ + μ 2 E X T F T α , subject to (2.1). Here A = r 0 s , α s , B = λ μ Y θ 0 r s , α s − 1 , b = δ λ μ Y , D 1 = β 0 0 0 , D 2 = 0 β s , α s 0 , Q = 0, Q ¯ = 0, F 1 z = − z 1 0 , ∞ z 0 0 , F 2 z = 0 z 1 − 1 , ∞ z 0 , Γ = 0 0 1 , R t , s = h s − t Γ ⊤ Γ, G = 1, G ¯ = − 1, C i = 0, σ i = 0, E k z = 0 and c k z = 0. Thus, the above model is a special case of the general time-inconsistent LQ problem formulated earlier in this paper. Then we apply Corollary 12 and Theorem 13 to obtain the unique Nash equilibrium trading strategy. Define (4.8) ρ s , α s ≜ λ μ Y θ 0 2 β 0 2 + ∫ 0 + ∞ z 2 θ α 1 d z + r s , α s 2 β s , α s 2 + ∫ − 1 + ∞ z 2 θ α 2 d z .

Then the system (3.23) reduced to the following: for s ∈ 0 , T , (4.9) M ′ s , e i + M s , e i 2 r 0 s , e i − Υ s , e i + λ i i − ρ s , e i Υ s , e i + ∑ j ≠ i d λ i j M s , e j = 0 , M ¯ ′ s , e i + M ¯ s , e i 2 r 0 s , e i − Υ s , e i + λ i i − ρ s , e i Υ s , e i + ∑ j ≠ i d λ i j M ¯ s , e j = 0 , Υ ′ s , e i + Υ s , e i r 0 s , e i + λ i i + ∑ j ≠ i d λ i j Υ s , e j = 0 , φ ′ s , e i + φ s , e i r 0 s , e i + λ i i + ∑ j ≠ i d λ i j φ s , e j = 0 , M T , e i = 1 , M ¯ T , e i = − 1 , Υ T , e i = − μ 1 , φ T , e i = − μ 2 .

By standard arguments, we obtain, for s ∈ 0 , T and e i ∈ X, M s , e i = e ∫ s T 2 r 0 τ , e i − Υ τ , e i + λ i i d τ 1 + ∫ s T e − ∫ τ T 2 r 0 u , e i − Υ u , e i + λ i i d u − ρ τ , e i Υ τ , e i + ∑ j ≠ i d λ i j M τ , e j d τ , = M ¯ s , e i , also we have, for e i ∈ X, Υ s , e i = e ∫ s T r 0 τ , e i + λ i i d τ × − μ 1 + ∫ s T e ∫ τ T − r 0 u , e i + λ i i d u ∑ j ≠ i d λ i j Υ τ , e j d τ and φ s , e i = e ∫ s T r 0 τ , e i + λ i i d τ × − μ 2 + ∫ s T e ∫ τ T − r 0 u , e i + λ i i d u ∑ j ≠ i d λ i j φ τ , e j d τ .

In view of Theorem 13, the Nash equilibrium control (3.25) gives, for s ∈ 0 , T , (4.10) u ˆ 1 s = − ∑ i = 1 d α s − , e i λ μ Y θ 0 β 0 2 + ∫ 0 + ∞ z 2 θ α 1 d z Φ 1 s , e i X ˆ s + Φ 2 s , e i , (4.11) u ˆ 2 s = − ∑ i = 1 d α s − , e i r s , e i β s , e i 2 + ∫ − 1 + ∞ z 2 θ α 2 d z Φ 1 s , e i X ˆ s + Φ 2 s , e i , (4.12) u ˆ 3 s = ∑ i = 1 d α s − , e i Υ s , e i X ˆ s + φ s , e i , where ∀ s , e i ∈ 0 , T × X (4.13) Φ 1 s , e i = e ∫ s T − r 0 τ , e i + Υ τ , e i d τ − μ 1 + ∫ s T e ∫ τ T − r 0 u , e i + λ i i d u ∑ j ≠ i d λ i j Υ τ , e j d τ 1 + ∫ s T e − ∫ τ T 2 r 0 u , e i − Υ u , e i + λ i i d u − ρ τ , e i Υ τ , e i + ∑ j ≠ i d λ i j M τ , e j d τ , and (4.14) Φ 2 s , e i = e ∫ s T − r 0 τ , e i + Υ τ , e i d τ − μ 2 + ∫ s T e ∫ τ T − r 0 u , e i + λ i i d u ∑ j ≠ i d λ i j φ τ , e j d τ 1 + ∫ s T e − ∫ τ T 2 r 0 u , e i − Υ u , e i + λ i i d u − ρ τ , e i Υ τ , e i + ∑ j ≠ i d λ i j M τ , e j d τ .

The conditional expectation of the corresponding equilibrium wealth process solves the equation d E X ˆ s F T α = P 1 s , α s E X ˆ s F T α + P 2 s , α s d s , E X ˆ 0 F T α = x 0 , where P 1 s , α s = r 0 s , α s − ρ s , α s Φ 1 s , α s − Υ s , α s , P 2 s , α s = − ρ s , α s Φ 2 s , α s − φ s , α s + b s , α s .

Technical computations show that d E X ˆ s 2 F T α = 2 P 1 s , α s + P 3 s , α s E X ˆ s 2 F T α d E X ˆ s 2 F T α = + 2 P 2 s , α s + P 4 s , α s E X ˆ s F T α d E X ˆ s 2 F T α = + P 5 s , α s d s , E X ˆ 0 2 F T α = x 0 2 , and d Var X ˆ s F T α = 2 P 1 s , α s Var X ˆ s F T α + P 3 s , α s E X ˆ s 2 F T α + 2 P 4 s , α s E X ˆ s F T α + P 5 s , α s d s , Var X ˆ 0 F T α = 0 , where P 3 s , α s = ρ s , α s Φ 1 s , α s 2 , P 4 s , α s = ρ s , α s Φ 1 s , α s Φ 2 s , α s , P 5 s , α s = ρ s , α s Φ 2 s , α s 2 .

Then E X ˆ s F T α = ∑ i = 1 d α s − , e i e ∫ 0 s P 1 τ , e i d τ E X ˆ s F T α = × x 0 + ∫ 0 s e ∫ 0 τ − P 1 u , e i d u P 2 τ , e i d τ , E X ˆ s 2 F T α = ∑ i = 1 d α s − , e i e ∫ 0 s 2 P 1 τ , e i + P 3 τ , e i d τ E X ˆ s 2 F T α = × x 0 2 + ∫ 0 s e ∫ 0 τ − 2 P 1 u , e i + P 3 u , e i d u E X ˆ s 2 F T α = × 2 P 2 τ , e i + P 4 τ , e i E X ˆ τ F T α + P 5 τ , e i d τ , and Var X ˆ s F T α = ∑ i = 1 d α s − , e i e ∫ 0 s 2 P 1 τ , e i d τ ∫ 0 s e ∫ 0 τ − 2 P 1 u , e i d u P 3 τ , e i E X ˆ τ 2 F T α + 2 P 4 τ , e i E X ˆ τ F T α + P 5 τ , e i d τ .

Hence the objective function value for the equilibrium trading strategy u ˆ · is J 0 , x 0 , e i 0 ; u ˆ · = E ∑ i = 1 d α T , e i ∫ 0 T 1 2 h s Υ s , e i X ˆ s + φ s , e i 2 d s + 1 2 e ∫ 0 T 2 P 1 τ , e i d τ ∫ 0 T e ∫ 0 τ − 2 P 1 u , e i d u P 3 τ , e i E X ˆ τ 2 F T α + 2 P 4 τ , e i E X ˆ τ F T α + P 5 τ , e i d τ − μ 1 x 0 + μ 2 e ∫ 0 T P 1 τ , e i d τ x 0 + ∫ 0 T e ∫ 0 τ − P 1 u , e i d u P 2 τ , e i d τ .

4.2 Conditional mean-variance investment and reinsurance strategies

In this subsection, we will address a special case where the insurer does not take into account the consumption strategy. The objective is to maximize the conditional expectation of terminal wealth E X T F T α and at the same time to minimize the conditional variance of the terminal wealth Var X T F T α , over controls u · valued in R 2 . Then, the mean-variance investment and reinsurance optimization problem is defined as minimizing the cost J t , ξ , e i ; · given by (4.15) J t , ξ , e i ; u · = 1 2 E Var X T F T α − μ 1 ξ + μ 2 E X T F T α , subject to, for s ∈ 0 , T , (4.16) d X s = r 0 s , α s X s + δ + θ 0 u 1 s λ μ Y + r s , α s u 2 s d s d X s = + β 0 u 1 s d W 1 s + β s , α s u 2 s d W 2 s d X s = − u 1 s − ∫ 0 + ∞ z N ˜ α 1 d s , d z + u 2 s − ∫ − 1 + ∞ z N ˜ α 2 d s , d z , X t = ξ , α t = e i , where t , ξ , e i ∈ 0 , T × L 2 Ω , F t α , P ; R × χ and u · = u 1 · , u 2 · ⊤ is an admissible trading strategy.

In this case, the equilibrium strategy given by the expressions (4.10) and (4.11) changes to, for s ∈ 0 , T , (4.17) u ˆ 1 s = − ∑ i = 1 d α s − , e i λ μ Y θ 0 β 0 2 + ∫ 0 + ∞ z 2 θ α 1 d z Φ 1 s , e i X ˆ s + Φ 2 s , e i , (4.18) u ˆ 2 s = − ∑ i = 1 d α s − , e i r s , e i β s , e i 2 + ∫ − 1 + ∞ z 2 θ α 2 d z Φ 1 s , e i X ˆ s + Φ 2 s , e i , where ∀ s , e i ∈ 0 , T × X (4.19) Φ 1 s , e i = e ∫ s T − r 0 τ , e i d τ − μ 1 + ∫ s T e ∫ τ T − r 0 u , e i d u ∑ j ≠ i d λ i j Υ τ , e j d τ 1 + ∫ s T e − ∫ τ T 2 r 0 u , e i + λ i i d u − ρ τ , e i Υ τ , e i + ∑ j ≠ i d λ i j M τ , e j d τ , (4.20) Φ 2 s , e i = e ∫ s T − r 0 τ , e i d τ − μ 2 + ∫ s T e ∫ τ T − r 0 u , e i d u ∑ j ≠ i d λ i j φ τ , e j d τ 1 + ∫ s T e − ∫ τ T 2 r 0 u , e i + λ i i d u − ρ τ , e i Υ τ , e i + ∑ j ≠ i d λ i j M τ , e j d τ .

Numerical example. In this section, by providing some numerical examples, we demonstrate the validity and good performance of our proposed study in solving the mean-variance problem with the Markov switching. For simplicity, let us consider Equation (4.16) in which the Markov chain takes two possible states e 1 = 1 and e 2 = 2, i.e. χ = 1 , 2 , with the generator of the Markov chain being H = 2 − 2 − 4 4 and the initial condition X 0 = 1.1. For illustration purpose, we assume the finite time horizon is given as T = 60 and that the coefficients of the dynamic equation are given below

	r 0 α t	r α t	β α t	δ	θ 0	β 0	λ	μ Y
α t =1	0.35	0.20	0.30	0.09	1.5	0.5	0.65	0.6
α t =2	0.40	0.25	0.55	0.09	1.5	0.5	0.65	0.6

We consider the cost function defined by Equation (4.15) with μ 1 = μ 2 = 1. Without loss of generality we use the notation E X ( t , i ) for E X ˆ t F T i where i = 1 , 2 and α.

Fig. 1.

The state change of the Markov chain

Fig. 2.

Expected equilibrium wealth in the three modes for i = 1 , 2 and alpha

Fig. 3.

Trajectories of the equilibrium wealth correspond to the Markov chain

Figure 1 depicts the state change of the Markov chain α ( · ) between 0 and 60 units of time, where the initial state is assume to be α ( 0 ) = 1.

Figure 2 presents the curves of the different state trajectories of the equilibrium expected wealth E X ( t , i ) , in the three mods: i = 1, i = 2 and i = α t . By using Matlab’s advanced ODE solvers (particularly the function ode45) and Markov chain α · , we can achieve trajectories of E X ( t , 1 ) , E X ( t , 2 ) and E X ( t , α t ) and their graphs: the dashed blue line is the graph of E X ( t , 1 ) , the continuous brown line is the graph of E X ( t , 2 ) , and the solid black line is the graph of E X ( t , α t ) , whose values are switched between the dashed blue line and the continuous brown line.

Figure 3 shows the state trajectory of the equilibrium wealth X ( · ). In fact, when α 0 = 1, X ( 0 ) = 1.1 is the initial state trajectory. Then the values are also switched between two paths which are the trajectories of the equilibrium wealth corresponding to the different states of the Markov chain: α t = 1 and α t = 2. As a result, by comparing with Figure 1, we can clearly see how the Markovian switching influences the overall behavior of the state trajectories of the equilibrium wealth.

4.3 Special cases and relationship to other works 4.3.1 Classical Cramér–Lundberg model

Now, assume that the insurer’s surplus is modelled by the classical Cramér–Lundberg (CL) model (i.e. the model (4.2) with β 0 = 0), and that the financial market consists of one risk-free asset whose price process is given by (4.3), and only one risky asset whose price process does not have jumps and is modelled by a diffusion process (i.e. the model (4.4) with z = 0 , d s -a.e.). Then the dynamics of the wealth process X · = X t , ξ , e i · ; u · which corresponds to an admissible strategy u · = u 1 · , u 2 · ⊤ and initial triplet t , ξ , e i ∈ 0 , T × L 2 Ω , F t α , P ; R × X can be described, for s ∈ t , T , by (4.21) d X s = r 0 s , α s X s + δ + θ 0 u 1 s λ μ Y + r s , α s u 2 s d s d X s = + β s , α s u 2 s d W 2 s − u 1 s − ∫ 0 + ∞ z N ˜ α 1 d s , d z , X t = ξ , α t = e i .

We derive the equilibrium strategy which is described for the following two cases.

Case 1: μ 1 = 0. We suppose that μ 1 = 0 and μ 2 = 1 γ , such that γ > 00$]]>. Then the minimization problem (4.15) reduces to (4.22) min J t , ξ , e i ; u · = E 1 2 Var X T F T α − 1 γ E X T F T α , subject to u · ∈ L F , p 2 0 , T ; R 2 , where X · = X t , ξ , e i · ; u · satisfies (4.21), for every t , x t , e i ∈ 0 , T × L 2 Ω , F t α , P ; R × χ. In this case the equilibrium reinsurance–investment strategy given by (4.17) and (4.18) for s ∈ 0 , T becomes (4.23) u ˆ 1 s = − ∑ i = 1 d α s − , e i λ μ Y θ 0 ∫ 0 + ∞ z 2 θ α 1 d z Φ 1 s , e i X ˆ s + Φ 2 s , e i , (4.24) u ˆ 2 s = − ∑ i = 1 d α s − , e i r s , e i β s , e i 2 Φ 1 s , e i X ˆ s + Φ 2 s , e i , where Φ 1 s , e i and Φ 2 s , e i are given by (4.19) and (4.20) for μ 1 = 0 and μ 2 = 1 γ .

In the absence of the Markov chain, i.e. when d = 1, ℓ s , α s ≡ ℓ s for ℓ = r 0 , r and β, the equilibrium solution (4.23) and (4.24) for s ∈ 0 , T reduces to u ˆ 1 s = λ μ Y θ 0 e ∫ s T − r 0 τ d τ γ ∫ 0 + ∞ z 2 θ 1 d z , u ˆ 2 s = r s e ∫ s T − r 0 τ d τ γ β s 2 .

It is worth pointing out that the above equilibrium solutions are identical to the ones found in Zeng and Li [43] by solving some extended HJB equations.

Case 2: μ 2 = 0. Now, suppose that μ 1 = 1 γ and μ 2 = 0, such that γ > 00$]]>. Then the minimization problem (4.15) reduces to min J t , ξ , e i ; u · = E 1 2 Var X T F T α − ξ γ E X T F T α , for any t , x t , e i ∈ 0 , T × L 2 Ω , F t α , P ; R × χ. This is the case of the mean-variance problem with state dependent risk aversion. For this case the equilibrium reinsurance–investment strategy given by (4.17) and (4.18) for s ∈ 0 , T , reduces to (4.25) u ˆ 1 s = − ∑ i = 1 d α s − , e i λ μ Y θ 0 ∫ 0 + ∞ z 2 θ α 1 d z Φ 1 s , e i X ˆ s + Φ 2 s , e i , (4.26) u ˆ 2 s = − ∑ i = 1 d α s − , e i r s , e i β s , e i 2 Φ 1 s , e i X ˆ s + Φ 2 s , e i , where Φ 1 s , e i and Φ 2 s , e i are given by (4.19) and (4.20) for μ 1 = 1 γ and μ 2 = 0.

In the absence of the Markov chain the equilibrium solution reduces for s ∈ 0 , T to (4.27) u ˆ 1 s = λ μ Y θ 0 e ∫ s T − r 0 τ d τ X ˆ s ∫ 0 + ∞ z 2 θ 1 d z γ + ∫ s T e − ∫ τ T r 0 u d u ρ τ d τ , (4.28) u ˆ 2 s = r s e ∫ s T − r 0 τ d τ X ˆ s β s 2 γ + ∫ s T e − ∫ τ T r 0 u d u ρ τ d τ .

The equilibrium reinsurance–investment solution presented above is comparable to that found in Li and Li [17] in which the equilibrium is however defined within the class of feedback controls. Note that in [17] the authors adopted the approach developed by Björk et al. [4] and they have obtained feedback equilibrium solutions via some well posed integral equations.

4.3.2 The investment only

In this subsection, we consider the investment-only optimization problem. In this case the insurer does not purchase reinsurance or acquire new business, which means that u 1 s ≡ 1, and his consumption is not taken into account. We assume that the financial market consists of one risk-free asset whose price process is given by (4.3), and only one risky asset whose price process does not have jumps. A trading strategy u · reduces to a one-dimensional stochastic processes u 2 · in this case, where u 2 s represents the amount invested in the risky stock at time s. The dynamics of the wealth process X · which corresponds to an admissible investment strategy u 2 · and initial triplet t , ξ , e i ∈ 0 , T × L 2 Ω , F t α , P ; R × X can be described by d X s = r 0 s , α s X s + δ λ μ Y + r s , α s u 2 s d s + β 0 d W 1 s d X s = + β s , α s u 2 s d W 2 s − ∫ 0 + ∞ z N ˜ α 1 d s , d z , for s ∈ t , T , X t = ξ , α t = e i . Similar to the previous subsection, for the investment-only case we derive the equilibrium strategy which is described in the following two cases.

Case 1: μ 1 = 0. We suppose that μ 1 = 0 and μ 2 = 1 γ , such that γ > 00$]]>. In this case the equilibrium investment strategy given by (4.17) becomes u ˆ 2 s = − ∑ i = 1 d α s − , e i r s , e i β s , e i 2 Φ 1 s , e i X ˆ s + Φ 2 s , e i , s ∈ 0 , T , where Φ 1 s , e i and Φ 2 s , e i are given by (4.19) and (4.20) for μ 1 = 0 and μ 2 = 1 γ .

In the absence of the Markov chain the equilibrium solution reduces to u ˆ 2 s = r s e ∫ s T − r 0 τ d τ γ β s 2 , s ∈ 0 , T .

This essentially covers the solution obtained by Björk and Murgoci [3] by solving some extended HJB equations.

Case 2: μ 2 = 0. Now, suppose that μ 1 = 1 γ and μ 2 = 0, such that γ > 00$]]>. This is the case of the mean-variance problem with state-dependent risk aversion. For this case the equilibrium investment strategy given by (4.17) reduces to u ˆ 2 s = − ∑ i = 1 d α s − , e i r s , e i β s , e i 2 Φ 1 s , e i X ˆ s + Φ 2 s , e i , s ∈ 0 , T , where Φ 1 s , e i and Φ 2 s , e i are given by (4.19) and (4.20) for μ 1 = 1 γ and μ 2 = 0.

In the absence of the Markov chain the equilibrium solution reduces to u ˆ 2 s = r s e ∫ s T − r 0 τ d τ X ˆ s β s 2 γ + ∫ s T e − ∫ τ T r 0 u d u ρ τ d τ , s ∈ 0 , T .

This essentially covers the solution obtained by Hu et al. [15].

5 Conclusion

In this paper, we have considered a class of dynamic decision models of conditional time-inconsistent LQ type, under the effect of a Markovian regime-switching. We have employed the game theoretic approach to handle the time inconsistency. Throughout this study open-loop Nash equilibrium strategies are established as an alternative to optimal strategies. This was achieved using a stochastic system that includes a flow of forward-backward stochastic differential equations under equilibrium conditions. The inclusion of concrete examples in mathematical finance confirms the validity of our proposed study. The work may be developed in different ways:

(1)

The methodology may be expanded, for example, to a non-Markovian framework, implying that the coefficients of the controlled SDE as well as the coefficients of the objective functional are random. The research on this topic is in progress and will be covered in our forthcoming paper.

(2)

As the reviewer suggests, the model discussed in this paper may be extended to “progressive measurable” as an alternative of “predictable” control problem, and a research problem on how to obtain the corresponding state feedback equilibrium strategy is a very interesting and challenging one (see [29] for more details). Some further investigations will be carried out in our future publications.

Acknowledgments

We would like to thank the anonymous reviewer and the Editor for their constructive comments and suggestions on an earlier version of this paper, which led to a considerable improvement of the presentation of the work.

A Appendix A.1 Proofs and technical results

As the coefficients are affected by a random Markov switching and since we consider a family of a continuum of random variables (conditional expectations) parametrized by ε > 00$]]>, the limit in (3.2) is taken with any sequence ε n tending to 0, not ε tending to 0, see Definition 4. Due to the uncountable cardinality of ε > 00$]]>, the a.s. limit with respect to the whole ε > 00$]]> may not make sense and this is the reason of using ε n instead. We should consider a subsequence for the limit procedures in the proofs. To do so, we use the following lemma which was proved by Wang in [32], Lemma 3.3.

Lemma 16.

If f · = ( f 1 · , … , f m · ) ∈ L F p ( 0 , T ; R m ) with m ∈ N and p > 11$]]>, then for d t -a.e., there exists a sequence { ε n t } n ∈ N ⊂ ( 0 , T − t ) depending on t such that lim n → ∞ ε n t = 0 and lim n → ∞ 1 ε n t E ∫ t t + ε n t f i s − f i t p d s = 0 , for i = 1 , … , m , d P -a.s.

Proof of Lemma 5.

It is clear that ϕ s , α s is invertible for ∀ s ∈ 0 , T . We denote by ϕ s , α s − 1 the inverse of ϕ s , α s . Define for t ∈ 0 , T and s ∈ t , T the process p ¯ s ; t ≡ − ϕ s , α s p s ; t − G ¯ E X ˆ T F T α − μ 1 X ˆ t − μ 2 − ∫ s T ϕ τ , α τ Q ¯ E X ˆ τ F τ α d τ , and q ¯ i s ; t , r ¯ k s , z ; t , l ¯ j s ; t = − ϕ s , α s q i s ; t , r k s , z ; t , l j s ; t , for i = 1 , 2 , … , p; k = 1 , 2 , … , l and j = 1 , 2 , … , d. Then for any t ∈ 0 , T , in the interval t , T , the 4-tuple p ¯ · ; t , q ¯ · ; t , r ¯ · , · ; t , l ¯ · ; t satisfies (A.1) d p ¯ s ; t = − ∑ i = 1 p ϕ s , α s C i ⊤ ϕ s , α s − 1 q ¯ i s ; t d p ¯ s ; t = + ∑ k = 1 l ∫ R ∗ ϕ s , α s E k z ⊤ ϕ s , α s − 1 r ¯ k s , z ; t θ α k d z d p ¯ s ; t = + ϕ s , α s Q X ˆ s d s + ∑ i = 1 p q ¯ i s ; t d W i s d p ¯ s ; t = + ∑ k = 1 l ∫ R ∗ r ¯ k s − , z ; t N ˜ α k d s , d z + ∑ j = 1 d l ¯ j s , t d Φ ˜ j s , p ¯ T ; t = G X ˆ T .

Moreover, it is clear that for any t 1 , t 2 , s ∈ 0 , T such that 0 < t 1 < t 2 < s < T, we have p ¯ s ; t 1 , q ¯ i s ; t 1 , r ¯ k s , z ; t 1 , l ¯ j s ; t 1 = p ¯ s ; t 2 , q ¯ i s ; t 2 , r ¯ k s , z ; t 2 , l ¯ j s ; t 2 .

Hence, the solution p ¯ · ; t , q ¯ · ; t , r ¯ · , · ; t , l ¯ · ; t does not depend on t. Thus we denote the solution of (A.1) by p ¯ · , q ¯ · , r ¯ · , · , l ¯ · .

We have then, for any t ∈ 0 , T and s ∈ t , T , (A.2) p s ; t = − ϕ s , α s − 1 p ¯ s + G ¯ E X ˆ T F T α + μ 1 X ˆ t + μ 2 + ∫ s T ϕ τ , α τ Q ¯ E X ˆ τ F τ α d τ , and q i s ; t , r k s , z ; t , l j s ; t = − ϕ s , α s − 1 q ¯ i s , r ¯ k s , z , l ¯ j s for i = 1 , 2 , … , p, k = 1 , 2 , … , l, and j = 1 , 2 , … , d. □

Proof of Lemma 8.

From the representation (A.2) we have, for any t ∈ 0 , T and s ∈ t , T , (A.3) U s ; t − U s ; s = B ⊤ p s ; t − p s ; s = B ⊤ ϕ s , α s − 1 μ 1 X ˆ s − X ˆ t .

Moreover, since B and ϕ s , α s − 1 are uniformly bounded, for any a > 00$]]>, t ∈ 0 , T and ε ∈ 0 , T − t , we obtain P 1 ε E ∫ t t + ε U s ; t d s − 1 ε E ∫ t t + ε U s ; s d s ≥ a , ≤ 1 a E 1 ε E ∫ t t + ε U s ; t d s − 1 ε E ∫ t t + ε U s ; s d s d s , ≤ K 1 ε ∫ t t + ε E X ˆ s − X ˆ t d s = 0 , where the last equality is due to X ˆ · being right-continuous with finite left limits.

Therefore lim ε ↓ 0 P 1 ε E ∫ t t + ε U s ; t d s − 1 ε E ∫ t t + ε U s ; s d s ≥ a = 0 .

Hence, for each t there exists a sequence ε n t n ≥ 0 ⊂ 0 , T − t such that lim n → ∞ ε n t = 0 and lim n → ∞ 1 ε n t E ∫ t t + ε n t U s ; t d s − 1 ε n t E ∫ t t + ε n t U s ; s d s = 0 , d P -a.s.

Moreover, we get from Lemma 16 that there exists a subsequence of ε n t n ≥ 0 , which we also denote by ε n t n ≥ 0 , such that lim n → ∞ 1 ε n t E ∫ t t + ε n t U s ; s d s = U t ; t , d t -a.e., d P -a.s. □

Proof of Lemma 10.

Proceeding with standard arguments by using Gronwall’s lemma and the moment inequalities for diffusion processes with jumps (see, e.g., Lemma 4.1 in [29]), we obtain (3.15) and (3.16).

Moreover, it follows from the dynamics of y ε , v · in (3.13) that E y ε , v s F s α = ∫ t s E [ A ( r , α r ) y ε , v r F r α ] d r for all s ∈ [ t , T ]. By setting Ψ ( s ) = A ( s , α s ) in Lemma A.1 in [30], we get for some positive constants C that ∫ t s E [ A ( r , α r ) y ε , v r F r α ] d r 2 ≤ C ∫ t s E [ A ( r , α r ) y ε , v r F r α ] 2 d r , ≤ C ε ξ ε , where ξ : Ω × ] 0 , ∞ [ → ] 0 , ∞ [ satisfies ξ ( ε ) ↓ 0 as ε ↓ 0, a.s., which proves (3.17).

Now, we consider the difference (A.4) J t , X ˆ t , α t ; u ε . − J t , X ˆ t , α t ; u ˆ . = E ∫ t T Q X ˆ s + Q ¯ E X ˆ s F s α , y ε , v s + Y ε , v s + 1 2 Q y ε , v s + Y ε , v s , y ε , v s + Y ε , v s + 1 2 Q ¯ E y ε , v s + Y ε , v s F s α , E y ε , v s + Y ε , v s F s α + R u ˆ s , v 1 t , t + ε s + 1 2 R v , v 1 t , t + ε s d s + 1 2 G y ε , v T + Y ε , v T , y ε , v T + Y ε , v T + G X ˆ T + G ¯ E X ˆ T F T α + μ 1 X ˆ t + μ 2 , y ε , v T + Y ε , v T + 1 2 G ¯ E y ε , v T + Y ε , v T F T α , E y ε , v T + Y ε , v T F T α .

From (H1) and (3.15)–(3.17) the following estimate follows: E ∫ t T 1 2 Q ¯ E y ε , v s + Y ε , v s F s α , E y ε , v s + Y ε , v s F s α d s + 1 2 G ¯ E y ε , v T + Y ε , v T F T α , E y ε , v T + Y ε , v T F T α = o ε .

Then, from the terminal conditions in the adjoint equations, it follows that (A.5) J t , X ˆ t , α t ; u ε . − J t , X ˆ t , α t ; u ˆ . = E ∫ t T Q X ˆ s + Q ¯ E X ˆ s F s α , y ε , v s + Y ε , v s + 1 2 Q y ε , v s + Y ε , v s , y ε , v s + Y ε , v s + R u ˆ s , v 1 t , t + ε s + 1 2 R v , v 1 t , t + ε s d s − p T ; t , y ε , v T + Y ε , v T − 1 2 P T y ε , v T + Y ε , v T , y ε , v T + Y ε , v T + o ε .

Now, by applying Ito’s formula to s ↦ p s ; t , y ε , v s + Y ε , v s on t , T and by taking the expectation, we get (A.6) E p T ; t , y ε , v T + Y ε , v T = E ∫ t T v ⊤ B T p s ; t 1 t , t + ε s + y ε , v s + Y ε , v s ⊤ Q X ˆ s + Q ¯ E X ˆ s F s α + ∑ i = 1 p v ⊤ D i T q i s 1 t , t + ε s + ∑ k = 1 l ∫ R ∗ v ⊤ F k z T r k s , z θ α k d z 1 t , t + ε s d s .

By applying Ito’s formula to s ↦ P s y ε , v s + Y ε , v s , y ε , v s + Y ε , v s on t , T , we conclude from (H1) together with (3.15)–(3.17) and by taking the conditional expectation that (A.7) E P T y ε , v T + Y ε , v T , y ε , v T + Y ε , v T = E ∫ t T y ε , v s + Y ε , v s ⊤ Q s y ε , v s + ∑ i = 1 p v ⊤ D i ⊤ P s D i v 1 t , t + ε s + ∑ k = 1 l ∫ R ∗ v ⊤ F k z ⊤ P s + Γ s , z F k z v 1 t , t + ε s θ α k d z d s + o ε .

By taking (A.6) and (A.7) in (A.5), it follows that (A.8) J t , X ˆ t , α t ; u ε . − J t , X ˆ t , α t ; u ˆ . = − E ∫ t t + ε v ⊤ B ⊤ p s ; t + ∑ i = 1 p v ⊤ D i ⊤ q i s + 1 2 ∑ i = 1 p v ⊤ D i ⊤ P s D i v − v ⊤ R u ˆ s − 1 2 v ⊤ R v + ∑ k = 1 l ∫ R ∗ v ⊤ F k z ⊤ r k s , z + 1 2 P s + Γ s F k z v θ α k d z d s + o ε , which is equivalent to (3.18). □

Proof of Corollary 12.

First, we have J t , X ˆ t , α t ; u ε · − J t , X ˆ t , α t ; u ˆ · = E ∫ t T 1 2 Q X ε s + X ˆ s + Q ¯ E X ε s + X ˆ s F s α , X ε s − X ˆ s + 1 2 R u ε s + u ˆ s , u ε s − u ˆ s d s + 1 2 G X ε T + X ˆ T + G ‾ E X ε T + X ˆ T F T α + 2 μ 1 X ˆ t + μ 2 , X ε T − X ˆ T .

Noting that by applying Itô’s formula to s ↦ p s ; t , X ε s − X ˆ s E G X ˆ T + G ‾ E X ˆ T F T α + μ 1 X ˆ t + μ 2 , X ε T − X ˆ T = − E ∫ t T B ⊤ p s ; t + ∑ i = 1 p v ⊤ D i ⊤ q i s + ∑ k = 1 l ∫ R ∗ F k z ⊤ r k s , z θ α k d z , u ε s − u ˆ s + Q X ˆ s + Q ¯ E X ˆ s F T α , X ε s − X ˆ s d s .

Consequently, J t , X ˆ t , α t ; u ε · − J t , X ˆ t , α t ; u ˆ · = E ∫ t T 1 2 Q X ε s + X ˆ s − 2 Q X ˆ s + Q ¯ E X ε s + X ˆ s F s α − 2 Q ¯ E X ˆ s F s α , X ε s − X ˆ s + 1 2 R u ε s + u ˆ s − 2 ( B ⊤ p s ; t + ∑ i = 1 p D i ⊤ q i s + ∑ k = 1 l ∫ R ∗ F k z ⊤ r k s , z θ α k d z ) , u ε s − u ˆ s d s + 1 2 G X ε T + X ˆ T + G ‾ E X ε T + X ˆ T F T α , X ε T − X ˆ T − G X ˆ T + G ‾ E X ˆ T F T α , X ε T − X ˆ T .

By completing the square we get (A.9) = E ∫ t T Q 2 X ε s − X ˆ s 2 + Q ¯ 2 E X ε s + X ˆ s F T α 2 d s + 1 2 ∫ t t + ε R v + 2 u ˆ s − 2 ( B ⊤ p s ; t + ∑ i = 1 p D i ⊤ q i s + ∑ k = 1 l ∫ R ∗ F k z ⊤ r k s , z θ α k d z ) , v d s + G 2 X ε T − X ˆ T 2 + G ‾ 2 E X ε T + X ˆ T F T α 2 , ≥ 1 2 E ∫ t t + ε R v − 2 U s ; t , v d s ≥ − ∫ t t + ε E U s ; t , v d s .

Now we can divide by ε n and send ε n to 0. Therefore, it follows from Lemma 8 that u ˆ · is an equilibrium control. □

Proof of Theorem 13.

Suppose that u ˆ · is an admissible control and denote by X ˆ · a controlled process corresponding to it. According to Corollary 12, suppose that there exists a flow of 4-tuple of adapted processes for which the processes ( X ˆ · , p · ; · , q · ; · , r · , · ; · , l · ; · satisfies the following system of regime-switching forward-backward stochastic differential equations (A.10) d X ˆ s = A X ˆ s + B u ˆ s + b d s + ∑ i = 1 p C i X ˆ s + D i u ˆ s + σ i d W i s + ∑ k = 1 l ∫ R ∗ E k z X ˆ s − + F k z u ˆ s + c k z N ˜ α k d s , d z , s ∈ 0 , T , d p s ; t = − A ⊤ p s ; t + ∑ i = 1 p C i ⊤ q i s ; t + ∑ k = 1 l ∫ R ∗ E k z ⊤ r k s , z ; t θ α k d z d p s ; t = − Q X ˆ s − Q ¯ E X ˆ s F s α d s + ∑ i = 1 p q i s ; t d W i s d p s ; t = + ∑ k = 1 l ∫ R ∗ r k s , z ; t N ˜ α k d s , d z + ∑ j = 1 d l j s , t d Φ ˜ j s , s ∈ t , T , X ˆ 0 = x 0 , α 0 = e i 0 , p T ; t = − G X ˆ T − G ¯ E X ˆ T F T α − μ 1 X ˆ t − μ 2 , with the equilibrium condition d P -a.s. , d t -a.e. (A.11) B ⊤ p t ; t + ∑ i = 1 p D i ⊤ q i t + ∑ k = 1 l ∫ R ∗ F k z ⊤ r k t , z θ α k d z − R u ˆ t = 0 .

Now, to solve the above system, we assume the following ansatz: for 0 ≤ t ≤ s ≤ T, we put (A.12) p s ; t = − M s , α s X ˆ s − M ¯ s , α s E X ˆ s F s α − Υ s , α s X ˆ t − φ s , α s , where M · , · , M ¯ · , · , Υ · , · and φ · , · are deterministic, differentiable functions which are to be determined. From the terminal condition of the adjoint process, M · , · , M ¯ · , · , Υ · , · and φ · , · must satisfy the following terminal boundary condition, for all e i ∈ χ, (A.13) M T , e i = G , M ¯ T , e i = G ¯ , Υ T , e i = μ 1 , φ T , e i = μ 2 .

Applying Itô’s formula to (A.12) and using (A.10), it yields (A.14) d p s ; t = − L M s , α s X ˆ s + L M ‾ s , α s E X ˆ s F s α + L Υ s , α s X ˆ t + L φ s , α s + M s , α s A X ˆ s + B u ˆ s + b + M ¯ s , α s A E X ˆ s F s α + B E u ˆ s F s α + b d s − M s , α s ∑ i = 1 p C i X ˆ s + D i u ˆ s + σ i d W i s − M s , α s ∑ k = 1 l ∫ R ∗ E k z X ˆ s − + F k z u ˆ s + c k z N ˜ α k d s , d z − ∑ j = 1 d M s , e j − M s , α s − X ˆ s + M ¯ s , e j − M ¯ s , α s − E X ˆ s F s α + Υ s , e j − Υ s , α s − X ˆ t + φ s , e j − φ s , α s − d Φ ˜ j ( s ) .

Comparing with (A.10), we deduce that, for i = 1 , 2 , … , p, k = 1 , 2 , … , l, and j = 1 , 2 , … , d, (A.15) q i s ; t = q i s = − M s , α s C i X ˆ s + D i u ˆ s + σ i , r k s , z ; t = r k s , z = − M s , α s E k z X ˆ s − + F k z u ˆ s + c k z , l j s ; t = l j s = − M s , e j − M s , α s X ˆ s l j s ; t = l j s = − M ¯ s , e j − M ¯ s , α s E X ˆ s F s α l j s ; t = l j s = − Υ s , e j − Υ s , α s X ˆ t − φ s , e j − φ s , α s .

Moreover, by taking (A.12) and (A.15) in (A.11), we obtain R u ˆ t + B ⊤ M t , α t + M ¯ t , α t + Υ t , α t X ˆ t + B ⊤ φ t , α t + ∑ i = 1 p D i ⊤ M t , α t C i X ˆ t + D i u ˆ t + σ i + ∑ k = 1 l ∫ R ∗ F k z ⊤ M t , α t E k z X ˆ t − + F k z u ˆ t + c k θ α k d z = 0 .

Subsequently, we obtain that u ˆ · admits the following representation u ˆ s = − Ψ s , α s X ˆ s − ψ s , α s , where Ψ · , · and ψ · , · are given by (3.24).

Hence (3.25) holds, and for s ∈ 0 , T we have (A.16) E u ˆ s F s α = − Ψ s , α s E X ˆ s F s α − ψ s , α s .

Next, comparing the d s term in (A.14) with the ones in the second equation in (A.10), then by using the expressions (3.25) and (3.24), we obtain 0 = L M + M A + A ⊤ M + ∑ i = 1 p C i ⊤ M C i − M B + ∑ i = 1 p C i ⊤ M D i + ∑ k = 1 l ∫ R ∗ E k z ⊤ M F k z θ α k d z Ψ s , α s + ∑ k = 1 l ∫ R ∗ E k z ⊤ M E k z θ α k d z + Q X ˆ s + L M ¯ + M ¯ A − B Ψ + A ⊤ M ¯ + Q ¯ E X ˆ s F s α + L Υ + A ⊤ Υ X ˆ t + L φ + A ⊤ φ + M s , α s + M ¯ s , α s b − B ψ s , α s + ∑ i = 1 p C i ⊤ M σ i − D i ψ + ∑ k = 1 l ∫ R ∗ E k z ⊤ M c k z − F k z ψ θ α k d z .

This suggests that the functions M · , · , M ¯ · , · , Υ · , · and φ · , · solve the system of equations (3.23). In addition, we can verify that Ψ · , · and ψ · , · in (3.25) are both uniformly bounded. Then for s ∈ 0 , T the following linear SDE with jumps d X ˆ s = A − B Ψ s , α s X ˆ s + b − B ψ s , α s d s d X ˆ s = + ∑ i = 1 p C i − D i Ψ s , α s X ˆ s + σ i − D i ψ s , α s d W i s d X ˆ s = + ∑ k = 1 l ∫ R ∗ E k z − F k z Ψ s , α s X ˆ s − + c k z − F k z ψ s , α s N ˜ α k d s , d z , X ˆ 0 = x 0 , α 0 = e i 0 , has a unique solution X ˆ · ∈ S F 2 0 , T ; R n , and the following estimate holds E sup s ∈ 0 , T X ˆ s 2 ≤ K 1 + x 0 2 .

Hence the control u ˆ · defined by (3.23) is admissible. □

Proof of Theorem 14.

Suppose that there is another equilibrium control u ˜ · ∈ L F , p 2 0 , T ; R m and denote by X ˜ · its corresponding controlled sate equation, and by p ˜ · ; · , q ˜ · , r ˜ · , · , l ˜ · its corresponding unique solution to the BSDE (3.4) with X ˆ · replaced by X ˜ · . Then by Corollary 12 the 5-tuple ( p ˜ · ; · , q ˜ · , r ˜ · , · , l ˜ · , u ˜ · ) satisfies d P -a.s. , d t -a.e. (A.17) B ⊤ p ˜ t ; t + ∑ i = 1 p D i ⊤ q ˜ i t + ∑ k = 1 l ∫ R ∗ F k z ⊤ r ˜ k t , z θ α k d z − R u ˜ t = 0 .

Now, we define for t ∈ 0 , T , s ∈ t , T , i = 1 , … , p, k = 1 , … , l, j = 1 , 2 , … , d: p ˆ s ; t = p ˜ s ; t + M s , α s X ˜ s + M ¯ s , α s E X ˜ s F s α p ˆ s ; t = + Υ s , α s X ˜ t + φ s , α s , q ˆ i s = q ˜ i s + M s , α s C i X ˜ s + D i u ˜ s + σ i s , r ˆ k s , z = r ˜ k s , z + M s , α s E k z X ˜ s − + F k z u ˜ s − + c k z , l ˆ j s = l ˜ j s + M s , e j − M s , α s X ˜ s l ˆ j s = + M ¯ s , e j − M ¯ s , α s E X ˜ s F s α l ˆ j s = + Υ s , e j − Υ s , α s X ˜ t + φ s , e j − φ s , α s . It is easy to prove that p ˆ · ; t , q ˆ · , r ˆ · , · , l ˆ · ∈ L × L F 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d .

By (A.17) we have d P -a.s. , d t -a.e. − B ⊤ p ˆ t ; t − M t , α t + M ¯ t , α t + Υ t , α t X ˜ t − − φ t , α t − ∑ i = 1 p D i ⊤ q ˆ i t − M t , α t C i t X ˜ t − + D i u ˜ t + σ i − ∑ k = 1 l ∫ R ∗ F k z ⊤ r ˆ k t , z − M t , α t E k z X ˜ t − + F k z u ˜ t + c k z θ α k d z + R u ˜ t = 0 . Since Θ t , α t exists d P -a.s. , d t -a.e., using (3.24), we get (A.18) u ˜ t = Θ t , α t B ⊤ p ˆ t ; t + ∑ i = 1 p D i ⊤ q ˆ i t + ∑ k = 1 l ∫ R ∗ F k z ⊤ r ˆ k t , z θ α k d z − Ψ t , α t X ˜ t − − ψ t , α t .

From the above equality, we remark that if p ˆ t ; t = q ˆ t = r ˆ t , z = 0, d P -a.s. , d t -a.e., then the form of u ˜ · is the same as the form of the feedback control law specified by (3.25), and hence the uniqueness of the equilibrium control given by (3.25) holds. Moreover, for any t ∈ 0 , T and for any s ∈ t , T we have d p ˆ s ; t = d p ˜ s ; t + d M s , α s X ˜ s + M ¯ s , α s E X ˜ s F s α + Υ s , α s X ˜ t + φ s , α s .

Using the equations for p ˜ · ; t , X ˜ · , M · , · , M ¯ · , · , Υ · , · and φ · , · , respectively, and using equality (4.6) we find that p ˆ · ; · , q ˆ · , r ˆ · , · , l ˆ · satisfies (A.19) d p ˆ s ; t = − g s , p ˆ s ; t , q ˆ s , r ˆ s , z , p ˆ s ; s , E p ˆ s ; s F s α , E q ˆ s F s α , E r ˆ s , z F s α d s + ∑ i = 1 p q ˆ i s d W i s d p ˆ s ; t = + ∑ k = 1 l ∫ R ∗ r ˆ k s − , z N ˜ α k d s , d z + ∑ j = 1 d l ˆ j s d Φ ˜ j s , 0 ≤ t ≤ s ≤ T , p ˆ T ; t = 0 , t ∈ 0 , T , where (A.20) g s , p ˆ s ; t , q ˆ s , r ˆ s , z , p ˆ s ; s , E p ˆ s ; s F s α , E q ˆ s F s α , E r ˆ s , z F s α = A ⊤ p ˆ s ; t + ∑ i = 1 p C i ⊤ q ˆ i s + ∑ k = 1 l ∫ R ∗ E k z ⊤ r ˆ k s , z θ α k d z − M s , α s B + ∑ i = 1 p C i ⊤ M s , α s D i + ∑ k = 1 l ∫ R ∗ E k z ⊤ M s , α s F k z θ α k d z Θ s , α s × B ⊤ p ˆ s ; s + ∑ i = 1 p D i ⊤ q ˆ i s + ∑ k = 1 l ∫ R ∗ F k z ⊤ r ˆ k s , z θ α k d z − M ¯ s , α s B Θ s , α s B ⊤ E p ˆ s ; s F s α + ∑ i = 1 p D i ⊤ E q ˆ i s F s α − M ¯ s , α s B Θ s , α s + ∑ k = 1 l ∫ R ∗ F k z ⊤ E r ˆ k s , z F s α θ α k d z .

We will prove in the next lemma that Equation (A.19) admits at most one solution in L × L F 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d . Thus p ˆ ≡ 0, q ˆ ≡ 0, r ˆ ≡ 0 and l ˆ ≡ 0, hence the uniqueness of the equilibrium control given by (3.25) holds. □

For the uniqueness of solution to (A.19), we have the following lemma. Lemma 17.

Equation (A.19) admits at most one solution in L × L F 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d .

Proof of Lemma 17.

For any t ∈ 0 , T and s ∈ t , T , by Itô’s formula we have by taking expectations that there exists a constant K 1 > 00$]]> such that E p ˆ s ; t 2 + ∑ i = 1 p ∫ s T q ˆ i τ 2 d τ + ∑ k = 1 l ∫ s T ∫ R ∗ r ˆ k τ , z 2 θ α k d z d τ + ∑ j = 1 d ∫ s T l ˆ j τ 2 λ j τ d τ ≤ K 1 E ∫ s T p ˆ τ ; t p ˆ τ ; t + ∑ i = 1 p q ˆ i τ + ∑ k = 1 l ∫ R ∗ r ˆ k τ , z θ α k d z + ∑ j = 1 d l ˆ j τ λ j τ + p ˆ τ ; τ + E p ˆ τ ; τ F τ α + ∑ i = 1 p E q ˆ i τ F τ α + ∑ k = 1 l ∫ R ∗ E r ˆ k τ , z F τ α θ α k d z + ∑ j = 1 d E l ˆ j τ F τ α λ j τ d τ ≤ K 2 E ∫ s T p ˆ τ ; t 2 + p ˆ τ ; τ 2 d τ + 1 2 E ∑ i = 1 p ∫ s T q ˆ i τ 2 d τ + ∑ k = 1 l ∫ s T ∫ R ∗ r ˆ k τ , z 2 θ α k d z d τ + ∑ j = 1 d ∫ s T l ˆ j τ 2 λ j τ d τ , where we have used the inequality c a b ≤ β c 2 a 2 + 1 β b 2 , ∀ β > 00$]]>, a > 00$]]>, b > 00$]]>. Hence there exists a K 3 > 00$]]> such that (A.21) E p ˆ s ; t 2 + ∑ i = 1 p E ∫ s T q ˆ i τ 2 d τ + ∑ k = 1 l E ∫ s T ∫ R ∗ r ˆ k τ , z 2 θ α k d z d τ + ∑ j = 1 d E ∫ s T l ˆ j τ 2 λ j τ d τ ≤ K 3 E ∫ s T p ˆ τ ; t 2 + p ˆ τ ; τ 2 d τ .

Then we have, for any t ∈ 0 , T and s ∈ t , T , (A.22) E p ˆ s ; t 2 ≤ K 3 E ∫ s T p ˆ τ ; t 2 + p ˆ τ ; τ 2 d τ , thus E p ˆ s ; t 2 ≤ K 3 T − t sup τ ∈ t , T E p ˆ τ ; t 2 + sup τ ∈ t , T E p ˆ τ ; τ 2 ≤ 2 K 3 T − t sup t ≤ τ ≤ s ≤ T E p ˆ s ; τ 2 , hence sup t ≤ τ ≤ s ≤ T E p ˆ s ; τ 2 ≤ 2 K 3 T − t sup t ≤ τ ≤ s ≤ T E p ˆ s ; τ 2 .

If we take ϵ = 1 8 K 3 , we get that, for t ∈ T − ϵ , T and s ∈ t , T , sup t ≤ τ ≤ s ≤ T E p ˆ s ; τ 2 ≤ 1 4 sup t ≤ τ ≤ s ≤ T E p ˆ s ; τ 2 , hence sup t ≤ τ ≤ s ≤ T E p ˆ s ; τ 2 = 0 , which means that p ˆ s ; τ = 0, P -a.s. ∀ τ , s ∈ τ , s : t ≤ τ ≤ s ≤ T . For t ∈ T − 2 ϵ , T − ϵ and s ∈ T − ϵ , T , since we have p ˆ τ ; τ = 0 for τ ∈ s , T , by (A.22), we have E p ˆ s ; t 2 ≤ K 3 E ∫ s T p ˆ τ ; t 2 d τ , and by Gronwall’s inequality we conclude that p ˆ s ; t = 0.

Now for t ∈ T − 2 ϵ , T − ϵ and s ∈ t , T − ϵ , since we have p ˆ T − ϵ ; t = 0, we apply the above analysis for the region t ∈ T − ϵ , T and s ∈ t , T , to confirm that p ˆ s ; τ = 0, P -a.s. ∀ τ , s ∈ τ , s : t ≤ τ ≤ s ≤ T − ϵ . We reiterate the same analysis for t ∈ T − 3 ϵ , T − 2 ϵ , and again and again up time t = 0. Hence p ˆ s ; t = 0, P -a.s., and for every t , s ∈ D 0 , T .

Finally, by (A.21) we obtain E ∫ 0 T ∑ i = 1 p q ˆ i τ 2 + ∑ k = 1 l ∫ R ∗ r ˆ τ , z 2 θ α k d z + ∑ j = 1 d l ˆ j τ 2 λ j τ d τ ≤ K 3 E ∫ 0 T p ˆ τ ; t 2 + p ˆ τ ; τ 2 d τ = 0 , which yields that q ¯ ≡ 0, r ¯ ≡ 0 and l ¯ ≡ 0. □

A.2 Existence and uniqueness of solutions to SDE and BSDE

In what follows, we will state some basic results on SDEs and BSDEs with jumps which we have used in this paper.

Let t ∈ 0 , T , denote by P the F t -predictable σ-field on 0 , T × F and by B H the Borel σ-algebra of any topological space H. For any given s ∈ 0 , T , consider the SDE with jumps (A.23) X ( t ) = ξ + ∫ s t b ( r , X ( r ) , α r ) d r + ∫ s t σ ( r , X ( r ) , α r ) d W ( r ) + ∬ R ∗ × ( s , t ] c ( r , z , X ( r − ) , α r ) N ˜ α ( d r , d z ) , where s ≤ t ≤ T. Here the coefficients ( ξ , b , g , σ ) are given mappings ξ : Ω ⟶ R n , b : [ 0 , T ] × Ω × R n × χ ⟶ R n , σ ≡ σ 1 , σ 2 , … , σ p : [ 0 , T ] × Ω × R n × χ ⟶ R n × p , c ≡ c 1 , c 2 , … , c l : [ 0 , T ] × Ω × R ∗ × R n × χ ⟶ R n × l satisfying the assumptions below:

(H’1)

ξ ∈ L 2 Ω , F t , P ; R n , the coefficients b , σ are P ⊗ B R n ⊗ B χ measurable and c is P ⊗ B R n ⊗ B ( R ∗ ) ⊗ B χ measurable and, for all e i ∈ χ, E ∫ 0 T b ( t , 0 , e i ) + σ ( t , 0 , e i ) + ∫ R ∗ c ( t , z , 0 , e i ) θ α d z d t < ∞ ;

(H’2)

b , σ and c are uniformly Lipschitz continuous w.r.t. x, that is, there exists a constant C > 00$]]> s.t. for all ( t , x , x ¯ , e i ) ∈ [ 0 , T ] × R n × R n × χ and a.s. ω ∈ Ω, | b ( t , x , e i ) − b ( t , x ¯ , e i ) | 2 + | g ( t , x , e i ) − g ( t , x ¯ , e i ) | 2 + ∫ R ∗ | σ ( t , z , x , e i ) − σ ( t , z , x ¯ , e i ) | 2 θ α d z ⩽ C | x − x ¯ | 2 .

Theorem 18.

If the coefficients ( ξ , b , g , σ ) satisfy Assumption (H’1)–(H’2), then the SDE (A.23) has a unique solution X ( · ) ∈ S F 2 s , T ; R n . Moreover, the following estimate holds E sup s ≤ t ≤ T X s 2 ≤ K 1 + E ξ 2 .

Proof.

Let 0 = τ 0 < τ 1 < τ 2 < ⋯ < τ n < ⋯ be the jump times of the Markov chain α ( · ), and let e 1 ∈ χ be the starting state. Thus α ( t ) = e 1 on τ 0 , τ 1 , and the system (A.23) for t ∈ τ 0 , τ 1 has the form: d X ( t ) = b ( t , X ( t ) , e 1 ) d t + σ ( t , X ( t ) , e 1 ) d W ( t ) + ∫ R ∗ c ( t , z , X ( t − ) , e 1 ) N ˜ α ( d t , d z ) . By Theorem 117 in [25], the above SDE has the unique solution X ( · ) in the space S F 2 τ 0 , τ 1 ; R n , and by continuity for t = τ 1 as well. By considering α τ 1 = e 2 , the system for t ∈ τ 1 , τ 2 becomes d X ( t ) = b ( t , X ( t ) , e 2 ) d t + σ ( t , X ( t ) , e 2 ) d W ( t ) + ∫ R ∗ c ( t , z , X ( t − ) , e 2 ) N ˜ α ( d t , d z ) .

Again, by Theorem 117 in [25], this SDE has a unique solution X ( · ) in the space S F 2 ( [ τ 1 , τ 2 ) ; R n ), and by continuity for t = τ 2 as well. Repeating this process continuously, we obtain that the solution X ( · ) of system (A.23) remains in S F 2 0 , T ; R n with probability one. □

The form of linear BSDEs (3.4) and (3.5) given in Section 3.1 is the motivation for us to study the following general BSDE with Markov switching (A.24) Y ( t ) = ς + ∫ t T g ( s , Y ( s ) , Z ( s ) , K ( s , · ) , V s , α ( s ) ) d s − ∫ t T ∑ i = 1 p Z i ( s ) d W i ( s ) − ∫ t T ∫ R ∗ ∑ r = 1 l K r ( s , z ) N ˜ α r ( d s , d z ) − ∫ t T ∑ j = 1 d V j s d Φ ˜ j s , t ∈ [ 0 , T ] . Here g : Ω × [ 0 , T ] × R n × R n × d × L 2 R ∗ , B R ∗ , θ ; R n × l × L λ 2 × χ → R n , where L λ 2 is the set of functions I ( · ) : χ → R n × d such that ‖ I ( · ) ‖ λ 2 : = ∑ j = 1 d | I j ( t ) | 2 λ j ( t ) < ∞. We make the following assumptions.

(H’3)

ς ∈ L 2 Ω , F t , P ; R n .

(H’4)

For all ( y , z , k , υ ) ∈ R n × R n × d × L 2 R ∗ , B R ∗ , θ ; R n × l × L λ 2 and e i ∈ χ, for i = 1 , … , d, f ( · , y , z , k , υ , e i ) ∈ L F 2 0 , T ; R n .

(H’5)

∀ e i ∈ χ, f ( t , y , z , k , υ , e i ) is uniformly Lipschitz with respect to y, z, k and υ, i.e. there exists a constant C > 00$]]>, such that for all ( ω , t ) ∈ Ω × [ 0 , T ], y , y ′ ∈ R n , z , z ′ ∈ R n × d , k , k ′ ∈ L 2 R ∗ , B R ∗ , θ ; R n × l , υ , υ ′ ∈ L λ 2 , f ( t , y , z , k , e i ) − f t , y ′ , z ′ , k ′ , e i ≤ C y − y ′ + z − z ′ + k − k ′ θ + υ − υ ′ λ .

Theorem 19.

Suppose that (H’3)–(H’5) hold. Then BSDE with Markov switching (A.24) admits a unique solution.

Before proving this theorem, we give an extended martingale representation results by the following lemma. Its proof follows from Lemma 3.1 in Cohen and Elliott [7], together with Proposition 3.2 in Shi and Wu [27].

Lemma 20.

Let t ∈ 0 , T . For M ∈ L 2 Ω , F t , P ; R n , there exists a unique process ( Y , Z , K , V ) ∈ S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d such that M t = M ( 0 ) + ∫ 0 t ∑ i = 1 p Z i ( s ) d W i ( s ) + ∫ 0 t ∫ R ∗ ∑ r = 1 l K r ( s , z ) N ˜ α r ( d s , d z ) + ∫ 0 t ∑ j = 1 d V j s d Φ ˜ j s .

Proof of Theorem 19.

First we note that, for all ( y , z , k , υ ) ∈ S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d , the following is valid: E ∫ 0 T g ( s , y ( s ) , z ( s ) , k ( s , · ) , υ s , α ( s ) ) d s 2 ≤ 2 E ∫ 0 T ( g ( s , y ( s ) , z ( s ) , k ( s , · ) , υ s , α ( s ) ) − g ( s , 0 , 0 , 0 , 0 , α ( s ) ) ) d s 2 + 2 E ∫ 0 T g ( s , 0 , 0 , 0 , 0 , α ( s ) ) d s 2 , ≤ C ∑ i = 1 d E ∫ 0 T | g ( s , 0 , 0 , 0 , 0 , e i ) | 2 d s + C E ∫ 0 T | y ( s ) | 2 + | z ( s ) | 2 + ‖ k ( s , · ) ‖ θ 2 + ‖ υ ( s ) ‖ λ 2 d s < ∞ .

It follows that ς + ∫ 0 T g ( s , y ( s ) , z ( s ) , k ( s , · ) , υ s , α ( s ) ) d s ∈ L 2 Ω , F t , P ; R n .

From assumptions (H’3)–(H’5), it is clear that M ( t ) = E ξ + ∫ 0 T g ( s , y ( s ) , z ( s ) , k ( s , · ) , υ s , α ( s ) ) d t ∣ F t is a square integrable F t -martingale. By virtue of martingale representation theorem, there exists ( Y , Z , K , V ) ∈ S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d , such that M ( t ) = M 0 + ∫ 0 t ∑ i = 1 p Z i ( s ) d W i ( s ) + ∫ 0 t ∫ R ∗ ∑ r = 1 l K r ( s , z ) N ˜ α r ( d s , d z ) + ∫ 0 t ∑ j = 1 d V j s d Φ ˜ j s .

Setting Y ( t ) = M ( t ) − ∫ 0 t g ( s , y ( s ) , z ( s ) , k ( s ) , υ s , α ( s ) ) d s gives Y ( t ) = ς + ∫ t T g ( s , y ( s ) , z ( s ) , k ( s , · ) , υ s , α ( s ) ) d s − ∫ t T ∑ i = 1 p Z i ( s ) d W i ( s ) − ∫ t T ∫ R ∗ ∑ r = 1 l K r ( s , z ) N ˜ α r ( d s , d z ) − ∫ t T ∑ j = 1 d V j s d Φ ˜ j s .

From the argument given above, we define the mapping Δ from S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d into itself by Δ ( y , z , k , υ ) : = ( Y , Z , K , V ), and for ( y , z , k , υ ) ∈ S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d we introduce the norm defined by ‖ ( y , z , k , υ ) ‖ β , θ , λ 2 : = E ∫ 0 T e β s | y ( s ) | 2 + | z ( s ) | 2 + ‖ k ( s , · ) ‖ θ + υ ( s ) λ 2 d s , where β > 00$]]> is to be determined later. We will prove that Δ is a contraction mapping under the norm ‖ · ‖ β , θ , λ . For this purpose, let ( y , z , k , υ ) , y ′ , z ′ , k ′ , υ ′ ∈ S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d , where ( Y , Z , K , V ) = Δ ( y , z , k , υ ) , Y ′ , Z ′ , K ′ , V ′ = Δ y ′ , z ′ , k ′ , υ ′ . We set ( y ˆ , z ˆ , k ˆ , υ ˆ ) = y − y ′ , z − z ′ , k − k ′ , υ − υ ′ , ( Y ˆ , Z ˆ , K ˆ , V ˆ ) = Y − Y ′ , Z − Z ′ , K − K ′ , V − V ′ . We know that ( Y ˆ , Z ˆ , K ˆ , V ˆ ) ∈ S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d , and E sup 0 ≤ t ≤ T | Y ˆ ( t ) | 2 < ∞. Note that Y ˆ ( t ) = ∫ t T g ( s , y ( s ) , z ( s ) , k ( s , · ) , υ s , α ( s ) ) − g s , y ′ ( s ) , z ′ ( s ) , k ′ ( s , · ) , υ ′ s , α ( s ) d s − ∫ t T Z ˆ ( s ) d W ( s ) − ∫ t T ∫ R ∗ K ˆ ( s , z ) N ˜ α ( d s , d z ) , t ∈ [ 0 , T ] .

Applying Ito’s formula to | Y ˆ ( s ) | 2 e β s , we can get E | Y ˆ ( 0 ) | 2 + E ∫ 0 T β | Y ˆ ( s ) | 2 + | Z ˆ ( s ) | 2 + ‖ K ˆ ( s , · ) ‖ θ 2 + ‖ V ˆ ( s ) ‖ λ 2 e β s d s = E ∫ 0 T 2 Y ˆ ( s ) g ( s , y ( s ) , z ( s ) , k ( s , · ) , υ s , α ( s ) ) − g s , y ′ ( s ) , z ′ ( s ) , k ′ ( s , · ) , υ ′ s , α ( s ) e β s d s , ≤ 2 C E ∫ 0 T Y ˆ ( s ) | y ˆ ( s ) | + | z ˆ ( s ) | + ‖ k ˆ ( s , · ) ‖ θ + ‖ υ ( s ) ‖ λ e β s d s , ≤ 1 2 E ∫ 0 T | y ˆ ( s ) | 2 + | z ˆ ( s ) | 2 + ‖ k ˆ ( s , · ) ‖ θ 2 + ‖ υ ( s ) ‖ λ 2 e β s d s + 6 C 2 E ∫ 0 T | Y ˆ s | 2 e β s d s .

We choose β = 1 + 6 C 2 , and hence E ∫ 0 T | Y ˆ ( s ) | 2 + | Z ˆ ( s ) | 2 + ‖ K ˆ ( s , · ) ‖ θ 2 + ‖ V ( s ) ‖ λ 2 e β s d s ≤ 1 2 E ∫ 0 T | y ˆ ( s ) | 2 + | z ˆ ( s ) | 2 + ‖ k ˆ ( s , · ) ‖ θ 2 + ‖ υ ( s ) ‖ λ 2 e β s d s , i.e. ‖ ( Y ˆ , Z ˆ , K ˆ , V ˆ ) ‖ β , θ , λ ≤ 1 2 ‖ ( y ˆ , z ˆ , k ˆ , υ ˆ ) ‖ β , θ , λ .

Then Δ is a strict mapping on S F 2 0 , T ; R n × L 2 0 , T ; R n p × L F , p θ , 2 0 , T × R ∗ ; R n l × L F , p λ , 2 0 , T ; R n d . It follows from the fixed-point theorem that this mapping admits a fixed point which is the unique solution of (A.24). The proof is complete. □

References [1]

Basak, S., Chabakauri, G.: Dynamic mean-variance asset allocation. Rev. Financ. Stud. 23, 2970–3016 (2010). https://doi.org/10.1093/rfs/hhq028

[2]

Bensoussan, A., Sung, K.C.J., Yam, S.C.P.: Linear-quadratic time-inconsistent mean field games. Dyn. Games Appl. 3(4), 537–552 (2013). MR3127149. https://doi.org/10.1007/s13235-013-0090-y

[3]

Björk, T., Khapko, M., Murgoci, A.: On time-inconsistent stochastic control in continuous time. Finance Stoch. 21, 331–360 (2017). MR3626618. https://doi.org/10.1007/s00780-017-0327-5

[4]

Björk, T., Murgoci, A., Zhou, X.Y.: Mean-variance portfolio optimization with state-dependent risk aversion. Math. Finance 24(1), 1–24 (2014). MR3157686. https://doi.org/10.1111/j.1467-9965.2011.00515.x

[5]

Chen, P., Yang, H., Yin, G.: Markowitz’s mean-variance asset-liability management with regime switching: a continuous-time model. Insur. Math. Econ. 43(3), 456–465 (2008). MR2479605. https://doi.org/10.1016/j.insmatheco.2008.09.001

[6]

Chen, P., Yang, H.: Markowitz’s mean-variance asset-liability management with regime switching: a multi period model. Appl. Math. Finance 18(1), 29–50 (2011). MR2786975. https://doi.org/10.1080/13504861003703633

[7]

Cohen, S.N., Elliott, R.J.: Solutions of backward stochastic differential equations on Markov chains. Commun. Stoch. Anal. 2, 251–262 (2008). MR2446692. https://doi.org/10.31390/cosa.2.2.05

[8]

Czichowsky, C.: Time-consistent mean-variance porftolio selection in discrete and continuous time. Finance Stoch. 17(2), 227–271 (2013). MR3038591. https://doi.org/10.1007/s00780-012-0189-9

[9]

Delong, Ł., Gerrard, R.: Mean-variance portfolio selection for a nonlife insurance company. Math. Methods Oper. Res. 66, 339–367 (2007). MR2342219. https://doi.org/10.1007/s00186-007-0152-2

[10]

Ekeland, I., Mbodji, O., Pirvu, T.A.: Time-consistent portfolio management. SIAM J. Financ. Math. 3, 1–32 (2012). MR2968026. https://doi.org/10.1137/100810034

[11]

Ekeland, I., Pirvu, T.A.: Investment and consumption without commitment. Math. Financ. Econ. 2, 57–86 (2008). MR2461340. https://doi.org/10.1007/s11579-008-0014-6

[12]

Elliott, R.J., Aggoun, L., Moore, J.B.: Hidden Markov Models: Estimation and Control. Springer, New York (1994). MR1323178

[13]

Goldman, S.M.: Consistent plans. Rev. Financ. Stud. 47, 533–537 (1980)

[14]

Hu, Y., Jin, H., Zhou, X.: Time inconsistent stochastic linear-quadratic control: characterization and uniqueness of equilibrium. SIAM J. Control Optim. 55(2), 1261–1279 (2017). MR3639569. https://doi.org/10.1137/15M1019040

[15]

Hu, Y., Jin, H., Zhou, X.Y.: Time-inconsistent stochastic linear quadratic control. SIAM J. Control Optim. 50(3), 1548–1572 (2012). MR2968066. https://doi.org/10.1137/110853960

[16]

Krusell, P., Smith, A.: Consumption and savings decisions with quasi-geometric discounting. Econometrica 71, 366–375 (2003)

[17]

Li, Y., Li, Z.: Optimal time-consistent investment and reinsurance strategies for mean-variance insurers with state dependent risk aversion. Insur. Math. Econ. 53(1), 86–97 (2013). MR3081464. https://doi.org/10.1016/j.insmatheco.2013.03.008

[18]

Liang, Z., Song, M.: Time-consistent reinsurance and investment strategies for mean-variance insurer under partial information. Insur. Math. Econ. 65, 66–76 (2015). MR3430397. https://doi.org/10.1016/j.insmatheco.2015.08.008

[19]

Nguyen, S.L., Yin, G., Nguyen, D.T.: A general stochastic maximum principle for mean-field controls with regime switching. Appl. Math. Optim. 84, 3255–3294 (2021). MR4308229. https://doi.org/10.1007/s00245-021-09747-x

[20]

Øksendal, B., Sulem, A.: Applied Stochastic Control of Jump Diffusions, 2nd edn. Springer, New York (2007). MR2322248. https://doi.org/10.1007/978-3-540-69826-5

[21]

Pham, H.: Linear quadratic optimal control of conditional McKean-Vlasov equation with random coefficients and applications. Probab. Uncertain. Quant. Risk 1, 7 (2016). MR3583182. https://doi.org/10.1186/s41546-016-0008-x

[22]

Peng, S.: A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim. 28, 966–979 (1990). MR1051633. https://doi.org/10.1137/0328054

[23]

Phelps, E.S., Pollak, R.A.: On second-best national saving and game-equilibrium growth. Rev. Econ. Stud. 35, 185–199 (1968). https://doi.org/10.2307/2296547

[24]

Pollak, R.: Consistent planning. Rev. Financ. Stud. 35, 185–199 (1968)

[25]

Rong, S.: Theory of Stochastic Differential Equations with Jumps and Applications: Mathematical and Analytical Techniques with Applications to Engineering. Springer, New York (2006). MR2160585

[26]

Shen, Y., Siu, T.K.: The maximum principle for a jump-diffusion mean-field model and its application to the mean-variance problem. Nonlinear Anal. 86, 58–73 (2013). MR3053556. https://doi.org/10.1016/j.na.2013.02.029

[27]

Shi, J., Wu, Z.: Backward stochastic differential equations with Markov switching driven by Brownian motion and Poisson random measure. Stoch. Int. J. Probab. Stoch. Process. 87(1), 1–29 (2015). MR3306809. https://doi.org/10.1080/17442508.2014.914514

[28]

Strotz, R.: Myopia and inconsistency in dynamic utility maximization. Rev. Econ. Stud. 23, 165–180 (1955). https://doi.org/10.2307/2295722

[29]

Song, Y., Tang, S., Wu, Z.: The maximum principle for progressive optimal stochastic control problems with random jumps. SIAM J. Control Optim. 58(4), 2171–2187 (2020). MR4127097. https://doi.org/10.1137/19M1292308

[30]

Sun, Z., Guo, X.: Equilibrium for a time-inconsistent stochastic linear-quadratic control system with jumps and its application to the mean-variance problem. J. Optim. Theory Appl. 181(2), 383–410 (2019). MR3938474. https://doi.org/10.1007/s10957-018-01471-x

[31]

Tang, S., Li, X.: Necessary conditions for optimal control of stochastic systems with random jumps. SIAM J. Control Optim. 32(5), 1447–1475 (1994). MR1288257. https://doi.org/10.1137/S0363012992233858

[32]

Wang, T.: Uniqueness of equilibrium strategies in dynamic mean-variance problems with random coefficients. J. Math. Anal. Appl. 490(1), 124199 (2020). MR4099907. https://doi.org/10.1016/j.jmaa.2020.124199

[33]

Wang, H., Wu, Z.: Partially observed time-inconsistency recursive optimization problem and application. J. Optim. Theory Appl. 161(2), 664–687 (2014). MR3193813. https://doi.org/10.1007/s10957-013-0326-4

[34]

Wei, J., Wong, K.C., Yam, S.C.P., Yung, S.P.: Markowitz’s mean-variance asset-liability management with regime switching: a time-consistent approach. Insur. Math. Econ. 53(1), 281–291 (2013). MR3081480. https://doi.org/10.1016/j.insmatheco.2013.05.008

[35]

Wu, Z., Wang, X.: FBSDE with Poisson process and its application to linear quadratic stochastic optimal control problem with random jumps. Acta Autom. Sin. 29, 821–826 (2003). MR2033363

[36]

Yang, B.Z., He, X.J., Zhu, S.P.: Continuous time mean-variance-utility portfolio problem and its equilibrium strategy. Optimization. MR3175527. https://doi.org/10.1080/02331934.2021.1939339

[37]

Yong, J.: A deterministic linear quadratic time-inconsistent optimal control problem. Math. Control Relat. Fields 1, 83–118 (2011). MR2822686. https://doi.org/10.3934/mcrf.2011.1.83

[38]

Yong, J.: Linear quadratic optimal control problems for mean-field stochastic differential equations: time-consistent solutions. SIAM J. Control Optim. 51(4), 2809–2838 (2013). MR3072755. https://doi.org/10.1137/120892477

[39]

Yong, J.: Time-inconsistent optimal control problems and the equilibrium HJB equation. Math. Control Relat. Fields 2(3), 271–329 (2012). MR2991570. https://doi.org/10.3934/mcrf.2012.2.271

[40]

Yong, J., Zhou, X.Y.: Stochastic Controls: Hamiltonian Systems and HJB Equations. Springer, New York (1999). MR1696772. https://doi.org/10.1007/978-1-4612-1466-3

[41]

Zhang, X., Sun, Z., Xiong, J.: A general stochastic maximum principle for a Markov regime switching jump-diffusion model of mean-field type. SIAM J. Control Optim. 56(4), 2563–2592 (2018). MR3828847. https://doi.org/10.1137/17M112395X

[42]

Zhao, Q., Shen, Y., Wei, J.: Consumption-investment strategies with non-exponential discounting and logarithmic utility. Eur. J. Oper. Res. 238(3), 824–835 (2014). MR3214861. https://doi.org/10.1016/j.ejor.2014.04.034

[43]

Zeng, Y., Li, Z.: Optimal time-consistent investment and reinsurance policies for mean-variance insurers. Insur. Math. Econ. 49, 145–154 (2011). MR2811903. https://doi.org/10.1016/j.insmatheco.2011.01.001

[44]

Zeng, Y., Li, Z., Lai, Y.: Time-consistent investment and reinsurance strategies for mean-variance insurers with jumps. Insur. Math. Econ. 52(3), 498–507 (2013). MR3054742. https://doi.org/10.1016/j.insmatheco.2013.02.007

[45]

Zhou, X.Y., Li, D.: Continuous-time mean-variance portfolio selection: a stochastic LQ framework. Appl. Math. Optim. 42, 19–33 (2000). MR1751306. https://doi.org/10.1007/s002450010003

[46]

Zhou, X.Y., Yin, G.: Markowitzs mean-variance portfolio selection with regime switching: a continuous-time model. SIAM J. Control Optim. 42, 1466–1482 (2003). MR2044805. https://doi.org/10.1137/S0363012902405583