1 Introduction and the main result
This paper aims at the large deviation principle (LDP) for the solutions to the SDEs
with possibly discontinuous coefficients a,σ. Recall that a family of (the distributions of) random elements {Xε} taking values in a Polish space X is said to satisfy the LDP with rate function I:X→[0,∞] and speed function r:R+→R+ if
for each closed F⊂X and
for each open G⊂X. The rate function is assumed to be lower semicontinuous; that is, all level sets {x:I(x)≤c}, c≥0, are closed. If all level sets are compact, then the rate function is called good.
We assume that, for some C,c>0,
It is well known that, in this case, the SDE (1) has a unique weak solution, which can be obtained by a proper combination of the time change transformation of a Wiener process and the Girsanov transformation of the measure; see [10], IV, §4. In what follows, we fix T>0, interpret the (weak) solution Xε={Xεt,t∈[0,T]} to (1) as a random element in C(0,T), and prove the LDP for the family {Xε}. Since the law of Xε does not depend on a possible change of the sign of σ, in what follows, we assume without loss of generality that σ>0.
Our principal regularity assumptions on the coefficients a,σ is that they have no discontinuities of the second kind, that is, they have left- and right-hand limits at every point x∈R. For a given pair of such functions a,σ, we define the modified functions ˉa,ˉσ as follows:
Denote by AC(0,T) the class of absolutely continuous functions ϕ:[0,T]→R, and for each f∈AC(0,T), we denote by ˙f its derivative, which is well defined for almost all t∈[0,T].
Theorem 1.
Theorem 1 has the form very similar to the classical Wentzel–Freidlin theorem ([9], Chapter 3, §2), which establishes LDP in a much more general setting, for multidimensional Markov processes that may contain both diffusive and jump parts. However, the Wentzel–Freidlin approach substantially exploits the continuity of infinitesimal characteristics of the process. The natural question arises: to which extent the continuity assumption can be relaxed in this theory. In [5, 6], the LDP was established for multidimensional diffusions with unit diffusion matrix and drift coefficients discontinuous along a given hyperplane; see also [1, 2, 4] for some other results in this direction. In [11], this result, in the one-dimensional setting, is extended to the case of piecewise smooth drift and diffusion coefficients with one common discontinuity point. The technique in the aforementioned papers is based on the analysis of the joint distribution of the process itself and its occupation time in the half-space above the discontinuity point (surface) and is hardly applicable when the structure of the discontinuity sets for the coefficients is more complicated. In [13], the LDP for a one-dimensional SDE with zero drift coefficient was established under a very mild regularity condition on σ: for the latter, it was only assumed that its discontinuity set has zero Lebesgue measure. Extension of this result to the case of nonzero drift coefficient is far from being trivial. In [14], such an extension was provided, but the assumption therein that a/σ2 possesses a bounded derivative is definitely too restrictive. In this paper, we summarize the studies from [13] and [14]; note that the assumption on σ in the current paper is slightly stronger than in [13].
We note that our main result, Theorem 1, well illustrates the relation of the LDP with discontinuous coefficients to the classical Wentzel–Freidlin theory: the rate function in this theorem is given in a classical form, but with the properly modified coefficients. The heuristics of this modification is clearly seen. Namely, thanks to (ii), the rate functional I is lower semicontinuous; see Section 2.2. Assertion (i) corresponds to the fact that, in the case a(x−)≥0 and a(x+)≤0, the family Xε with Xε0=x weakly converges to the constant function equal to x. We interpret the limiting function as the solution to the ODE ˙xt=ˉa(xt), and note that a similar ODE for a may fail to have a solution at all.
2 Preliminaries to the proof
2.1 Exponential tightness, contraction principle
Recall that a family {Xε} is called exponentially tight with the speed function r(ε) if for each Q>0, there exists a compact set K⊂X such that
For an exponentially tight family, the LDP is equivalent to the weak LDP; the latter by definition means that the upper bound (2) holds for all compact sets F, whereas the lower bound (3) still holds for all open sets G. An equivalent formulation of the weak LDP is the following: for each x∈X,
where Bδ(x) denotes the open ball with center x and radius δ.
To prove (5), we will use a certain extension of the contraction principle, which in its classical form (e.g., [8], Section 3.1, and [7], Section 4.2.1) states the LDP for a family Xε=F(Yε), where Yε is a family of random elements in a Polish space Y that satisfies an LDP with a good rate function J, and F:Y→X is a continuous mapping. The rate function for Xε in this case has the form
In the sequel, we use two different representations of our particular family Xε as an image of certain family whose LDP is well understood; however, the functions F in these representations fail to be continuous. Within such a framework, the following general lemma appears quite useful. We denote by ρX,ρY the metrics in X,Y and by ΛF the set of continuity points of a mapping F:Y→X. Note that ΛF is Borel measurable; see Appendix II in [3].
Lemma 1.
Let family Yε satisfy the LDP with speed function r(ε) and rate function J. Assume also that
Then, for any x∈X,
with
where
Proof.
We have
Thus, the upper bound in the LDP for {Yε} gives
where ˉΘδ(x) denotes the closure of Θδ(x). Since ˉΘδ(x)⊂Ξγ,δ(x) for any γ>0, this provides (6). The proof of (7) is even simpler: for any y∈Θδ(x), there exists r>0 such that the image of the ball Br(y) under F is contained in Bδ(x), which yields
□
Lemma 1 is a simplified and more precise version of Lemma 4 in [12]. The functions Iupper, Ilower are lower semicontinuous: we can show this easily using that, for any sequence xn→x, the sets Θδ/2(xn) are embedded into Θδ(x) for n large enough (see, e.g., Proposition 3 in [12]). In fact, Lemma 1 says that for an arbitrary image of a family {Yε}, one part of an LDP (the upper bound) holds with one rate function, whereas the other part (the lower bound) holds with another rate function. This is our reason to call (6) and (7) the upper and the lower semicontraction principles. To prove (5), it suffices to verify the inequalities
We refer to [12] for a more discussion and an example where the pair of semicontraction principles do not provide an LDP.
2.2 Lower semicontinuity of I
In this section, we prove directly that the functional I specified in Theorem 1 is lower semicontinuous, that is, it is indeed a rate functional. This will explain the particular choice of the modified functions ˉa,ˉσ. In addition, this will simplify the proofs, where we will use the representation for I(x) presented further.
Define
Then I(f), if it is finite, can be represented as
The function S is continuous; hence, the functional I2 is just continuous. The function ˉa2/ˉσ2 is lower semicontinuous by the choice of ˉa,ˉσ; thus, the functional I3 is lower semicontinuous. Finally, we can represent I1 in the form I1(f)=I0(Σ(f⋅)), where the function
is continuous, and the functional
is known to be lower semicontinuous (this is just the rate functional for the family {εW}). Hence, I1 is lower semicontinuous, which completes the proof of the statement.
3 Proof of Theorem 1
3.1 Exponential tightness and the weak LDP
In this section, we prove that the family {Xε} is exponentially tight with the speed function r(ε)=ε2. Note that
is a continuous martingale with the quadratic characteristics
see (4). Recall that Mε can be represented as a Wiener process with the time change t↦⟨Mε⟩t; see, for example, [10], II. §7. Then, for each R,
On the other hand, for each ω∈Ω such that ε|Mεt(ω)|>R, the corresponding trajectory of Xε satisfies
and therefore, by the Gronwall inequality,
Therefore, for any Q>0, there exists R such that
Next, recall the Arzelà–Ascoli theorem: for a closed set K⊂C(0,T) to be compact, it is necessary and sufficient that it is bounded and equicontinuous. The family εMε is represented as a time changed family εWε, where each Wε is a Wiener process, and the derivative of ⟨Mε⟩t is bounded by C. Using these observations, it is easy to deduce the exponential tightness for {εMε} using the well-known fact that the family {εW} is exponentially tight. On the other hand, for any ω such that the trajectory of Xεt is bounded by R, the corresponding trajectory of the process Xεt−εMεt satisfies the Lipschitz condition w.r.t. t with the constant C(1+R). Combined with the previous calculation, this easily yields the required exponential tightness.
In what follows, we proceed with the proof of (5). Since now the state space X=C(0,T) is specified, we change the notation and denotes the points in this space by f,g,…. Since the set B1(f) is bounded, the law of Xε restricted to any Bδ(f) does not change if we change the coefficients a,σ on the intervals (−∞,−R],[R,∞) with R>0 large enough. Hence, we furthermore assume the coefficients a,σ to be constant on such intervals for some R.
3.2 Case I. Piecewise constant a,σ with one discontinuity point
We proceed with the further proof in a step-by-step way, increasing gradually the classes of the coefficients a,σ for which the corresponding LDP is proved. First, let a,σ be constant on the intervals (−∞,z) and (z,∞) with some z∈R. Without loss of generality, we can assume that z=0. Then we can use Theorem 2.2 [11], where the LDP with the speed function r(ε)=ε2 is established for the pair (Xε,Zε) with
The corresponding rate function in [11] is given in the following form. Denote a±=a(0±),σ±=σ(0±) and define the class H(f) of functions ψ∈AC(0,T) such that
Then the rate functional for (Xε,Zε) equals
with
L(x,y,z)={(y−a(x))2σ2(x),x≠0;(a+z+a−(1−z))2σ2+z+σ2−(1−z),x=0,a−σ2−>a+σ2+;a2+σ2+z+a2−σ2−(1−z),x=0,a−σ2−≤a+σ2+
for all pairs (f,ψ) such that f∈AC(0,T),f0=x0,ψ∈H(f) and, for all other pairs, I(f,ψ)=∞.From this result, using the contraction principle (see Section 2), we easily derive the LDP for Xε with the rate function
for f∈AC(0,T), f0=x0 and I(f)=∞ otherwise. Now only a minor analysis is required to show that this rate function actually coincides with that specified in Theorem 1. First, we observe that
This is obvious if either a−/σ2−≤a+/σ2+ or a−>0, a+<0. In the case where a−/σ2−>a+/σ2+ and a−, a+ have the same sign, we can verify directly that L′z(0,y,z) have the same sign for z∈[0,1], which completes the proof of the required identity.
We will use repeatedly the following fact, which follows easily from the change-of-variables formula: for any f∈AC(0,T) and any set A⊂R with zero Lebesgue measure, the Lebesgue measure of the set
see, for example, Lemma 1 in [13]. Applying (10) with A={0}, we conclude that in the above expression for I(f), the function L can be changed to
which completes the proof of Theorem 1 in this case.
3.3 Case II. Piecewise constant a,σ
Let, for some z1<⋯<zm, the functions a,σ be constant on the intervals (−∞,z1), (z1,z2), …,(zm,∞). Assume that x0∉{zk,k=1,…,m}, which does not restrict the generality of the construction given further, and define the functions ak,σk,k=0,…,m by
Consider a family of independent processes Y0,ε,Yn,k,ε, k=1,…,m, n≥1, such that Y0,ε solves SDE (1) with the coefficients a0,σ0 and each Yn,k,ε solves a similar SDE with the coefficients ak,σk and the initial value zk. Define iteratively the process ˜Xε in the following way: put ˜Xε equal Y0,ε until the time moment
Define the random index κ1∈{1,…,m} such that Y0,ετ1=zκ1. Then put ˜Xεt=Y1,κ1,εt−τ1 until the first time moment τ2 when this process hits {zk,k=1,…,m}∖{zκ1}. Iterating this procedure, we get a process ˜Xεt with
It follows from the strong Markov property of Xε that ˜Xε has the same law with Xε. Hence, the given construction in fact represents the law of Xε as the image of the joint law family of independent processes Y0,ε,Yn,k,ε, k=1,…,m, n≥1. Each of these processes is a solution to (1) with corresponding coefficients having at most one discontinuity point; hence, the LDP for them is provided in the previous section. Our idea is to deduce the LDP Xε via a version of the contraction principle. With this idea in mind, we first perform a simplification of the above representation. For some N (the choice of N will be discussed below), we consider the space Y=C(0,T)1+mN and construct a function F:Y→X=C(0,T) in the following way. For y=(y0,yn,k,k=1,…,m,n=1,…,N), we first define τ1(y)=inf{t:y0t∈{zk,k=1,…,m}} with the usual convention that inf∅=T. The function [F(y)]t, t∈[0,T], is defined to be equal to y0t for t≤τ1(y). If τ1(y)<T, then the construction is iterated: we define κ1(y) by y0τ1(y)=xκ1(y) and put, for t≥τ1(y), [F(y)]t equal to y1,κ1(y)t−τ1(y) up to the first moment when this function hits {zk,k=1,…,m}∖{xκ1(y)}. We iterate this procedure at most N times; that is, if τN(y)<T, then we put
Denote
For any fixed f∈C(0,T), we can choose δf>0 small enough and Nf large enough so that each g∈Bδ(f) has less than N Δ-oscillations on [0,T]. Hence, if in the above construction, N is taken equal to Nf, then the restriction of the law of Xε to any ball Bδ(f),δ≤δf, equals to the same restriction of the image of the joint law of the finite family Y0,ε, Yn,k,ε, k=1,…,m, n=1,…,N, under the mapping F specified before.
We aim to verify (5), and we argue in the following way. We fix f and choose N=Nf as before, so that the laws of Xε, restricted to Bδ(f) for δ small, can be obtained as the image under F specified before. Then we prove (8) at this particular point x=f, with Ilower, Iupper being constructed by this particular F. This yields the required weak LDP (5).
Within such an argument, we have to treat for any N the image under the corresponding F of the family of laws in Y=C(0,T)1+mN, which, according to the result proved in the previous section, satisfies the LDP with the rate function
for
such that y0,yn,k∈AC(0,T), y00=x0, yn,k0=zk and J(y)=∞ otherwise. To apply Lemma 1 in this setting, we first analyze the structure and the properties of the corresponding F.
Each trajectory f=F(y)∈C(0,T) is actually a patchwork, which consists of pieces of trajectories y0,yn,k, k=1,…,m, n=1,…,N: the pasting points are τ1(y),…,τr(y), r=r(y)≤N, and after τn(y), the (part of the) new trajectory is used with the number κn(y). For a yl→y in Y, the corresponding sequence of trajectories fl=F(yl) may fail to converge to f because the functionals τn(⋅),κn(⋅) are not continuous. However, the above “patchwork representation” easily yields the following two facts:
-
• any limit point f∗ of the sequence {fl} possesses a similar representation with the same y=limlyl and with the corresponding pasting points τ∗1,τ∗2,… and numbers κ∗1,κ∗2,… being partial limits of the sequences {τ1(yl)},{τ2(yl)},… and {κ1(yl)},{κ2(yl)},…;
-
• if the functions τ1(⋅),τ2(⋅),… are continuous at a given point y∈Y, then y∈ΛF.
Using the first fact, now it is easy to prove the second inequality in (8). If it fails, then for a given f, there exists a sequence {yl} such that F(yl)→f and J(yl)≤c<I(f). Since the level set {y:J(y)≤c} is compact, we can assume without loss of generality that yl converge to some y; recall that J is lower semicontinuous and thus J(y)≤c. The function f possesses the above patchwork representation with the trajectories taken from y, some pasting points τ∗1,…,τ∗r, and some numbers κ∗1,…,κ∗r. From this representation it is clear that f∈AC(0,T) and f0=x0: if this fails, then the same properties fail at least for one trajectory from the family y and thus J(y)=∞, which contradicts to J(y)≤c. Hence, we have
where we put τ∗0=0,τ∗r+1=T. Let x0 be located on some interval (zk−1,zk), k=2,…,m, say, x0∈(z1,z2). Then, on the interval (0,τ∗1), the trajectory f is contained in the segment [z1,z2]. The functions a0,σ0 are constant and coincide with ˉa,ˉσ on (z1,z2). In addition, a0=a(z1+)=a(z2−), σ0=σ(z1+)=σ(z2−); hence, by the choice of ˉa,ˉσ we have
Then by (10) with A={z1,z2} we have
This gives a contradiction with inequalities J(y)≤c and I(f)>c, which completes the proof of the second inequality in (8).
∫τ∗10(˙ft−ˉa(ft))2ˉσ2(ft)dt=∫τ∗10((˙ft−a0(ft))2σ20(ft)1ft∉A+ˉa2(ft)ˉσ2(ft)1ft∈A)dt≤∫τ∗10((˙ft−a0(ft))2σ20(ft)1ft∉A+a20(ft)σ20(ft)1ft∈A)dt=∫τ∗10(˙ft−a0(ft))2σ20(ft)dt=∫τ∗10(˙y0t−a0(y0t))2σ20(y0t).
Analogous inequalities hold on each of the time intervals (τ∗n,τ∗n+1), n=1,…,r, with a0,σ0 changed to ˉaκ∗n,ˉσκ∗n (the proof is similar and omitted). Thus,
(11)
I(f)≤12∫τ∗10(˙y0t−a0(y0t))2σ20(y0t)dt+12r∑n=1∫τ∗n+1τ∗n(˙yκ∗n,nt−ˉaκ∗n(yκ∗n,nt))2ˉσ2κ∗n(yκ∗n,nt)dt≤12∫T0(˙y0t−a0(y0t))2σ20(y0t)dt+12m∑k=1N∑n=1∫T0(˙yk,nt−ˉak(yk,nt))2ˉσ2k(yk,nt)dt=J(y).The first inequality in (8) holds immediately for f such that I(f)=∞. We fix f with I(f)<∞ and γ>0 and construct yγ such that F(yγ)=f, the functions τ1(⋅),τ2(⋅),… are continuous at yγ, and J(yγ)≤I(f)+γ. This completes the proof of (8).
The construction explained gives a cue for the choice of y=yγ (we omit the index γ to simplify the notation). We put y0 equal to f until its first time moment τ∗1 of hitting the set {z1,z2} (we still assume that x0∈(z1,z2)). Then we extend y0 to the entire time interval [0,T], and we aim to make the integral
small enough; that is, to make small the error in the second inequality in (11), which arises because of the integral of y0. If we put y0t=yτ∗1+a0(t−τ∗1), then we obtain the trajectory at which the integral (12) equals zero; we call such a trajectory a zero-energy one. However, under such a choice, we may fail with the other our requirement that τ1(⋅) should be continuous at the point y. It is easy to verify that for such a continuity, it suffices that y0, if hitting {z1,z2} at a point, say, z1 at every interval (τ∗1,τ∗1+δ),δ>0, takes values both from (−∞,z1) and (z1,∞). We can perturb the zero-energy trajectory introduced above on a small time interval near τ∗1 in such a way that this new trajectory possesses the continuity property explained before, and the integral (12) is ≤γ/N.
Then we iterate this procedure. Observe that, for any k, by the construction of the function ˉak there exists at least one corresponding zero-energy trajectory with the initial value zk, which now is defined as a solution to the ODE
We have κ∗1 uniquely determined by the trajectory f (in fact, by the part of this trajectory up to time τ∗1). For k≠κ∗1, we define yk,1 as the zero-energy trajectory on [0,T] that starts from xk and corresponds to the coefficient ˉak. All these trajectories are “phantom” in the sense that they neither are involved into the representation of f through y nor give an impact into J(y). For k=κ∗1, we define yk,1 similarly as before: it equals ft+τ∗1 for t≤τ∗2−τ∗1, and afterwards it is defined as a perturbation of a zero-energy trajectory that makes τ2(⋅) continuous in y and
Repeating this construction ≤N times, we finally get the required function y=yγ. This completes the proof of (8) and thus of (5). Together with the exponential tightness proved in Section 3.1, this completes the proof of the LDP in this case.
3.4 Case III. Piecewise constant a/σ2, general σ
In this section, we remove the assumption on a,σ to be piecewise constant, still keeping this assumption for a/σ2; we also assume that a,σ are constant on (−∞,R] and [R,∞) for some R. Our basic idea is to represent {Xε} as the image under a time changing transformation of a family {Yε} and then to use the semicontraction principles. The same approach was used in [13], where the LDP was established for a solution of (1) with a≡0; in this case, Yε was taken in the form Yεt=x0+εWt. In our current setting, the choice of the coefficients for the SDE that defines Yε should take into account the common discontinuity points for a/σ and σ. This becomes visible both from an analysis of the proof of Theorem 1 in [13] and from the definition of the functions ˉa,ˉσ, which combines the left- and right-hand values of both a and σ at the discontinuity points. The proper choice of the family is explained below. Some parts of the arguments are similar to those in [13]. We omit detailed proofs whenever it is possible to give a reference to [13] and focus on the particularly new points.
We assume a/σ2 to be piecewise constant with discontinuity points z1<⋯<zm and put (with the convention ∏∅=1)
Under such a choice, ˜σ=συ, and thus the function ˜a/˜σ2 equals a/σ2 and is constant on each of the intervals (−∞,z1),…,(zm,∞). By construction, ˜σ is constant on these intervals as well; hence, ˜a,˜σ fit the case studied in the previous section, and the required LDP holds for the family Yε of the solutions to (1) with these coefficients and Yε0=x0. This construction yields also the following property, which will be important below: the function a=(a/σ2)σ2 does not change its sign on each of the intervals (−∞,z1),…,(zm,∞). Hence, denoting B=a2/σ2 and ˉB=ˉa2/ˉσ2, we get
Fix ε>0 and define
τt=[η]−1t (the inverse function w.r.t. t), and Xεt=Yετt. Then Xε is a weak solution to (1) with Xε0=x0; see [10], IV §7.
In the above construction, ηt≥c2t and thus τt≤c−2t; see (4). We put ˜T=c−2T, Y=C(0,˜T), and define Yε as a family of solutions to (1) with the coefficients ˜a,˜σ and the time horizon ˜T. Then the family Xε possesses a representation Xε=F(Yε) with the mapping F:Y→X defined by
Observe that for F to be continuous at a point y∈Y, it suffices that y spends zero time in the set Δυ of the discontinuity points of the function υ; see [13], Lemma 1 and Corollary 1. Now Δυ⊂Δa∪Δσ is at most countable, and it is easy to see that the continuity set ΛF has probability 1 w.r.t. the distribution of each Yε, that is, we can apply Lemma 1.
Our further aim is to prove (8) in the above setting, which then would imply (5) and thus prove the LDP. The general idea of the proof is similar to that of Theorem 1 in [13], though particular technicalities differ substantially.
First, for a given f∈X, we describe explicitly the set F−1({f}). We put
If f=F(y), then
here we changed the variables r=τs(y) and used that
Therefore,
Observe that ζT(f)≤c−2T=˜T and define
Then we conclude that
that is, for any y∈F−1({f}), the part of its trajectory with t≤ζT(f) is uniquely defined. On the other hand, it is easy to show that any y∈Y satisfying (14) belongs to F−1({f}).
Next, we denote by ˆa,ˆσ the modified coefficients, which correspond to the coefficients ˜a,˜σ in the sense explained in Section 1. Since ˜a=aυ2 and ˜σ=συ, we easily see that
at every continuity point x for υ. Then, for any y∈AC(0,˜T) with y0=x0 that spends zero time in the set Δυ, we have
On the other hand, using (14) and making the time change s=πt(f) with f=F(y), we get
12∫ζT(F(y))0(˙yt−ˉa(yt)υ2(yt))2ˉσ2(yt)υ2(yt)dt=12∫ζT(F(y))0(˙ytυ−2(yt)−ˉa(yt))2ˉσ2(yt)υ2(yt)dt=12∫T0(˙fs−ˉa(fs))2ˉσ2(fs)ds=I(F(y))
because now t=ζs(f)=τs(y) and thus
Thus,
with
Now we are ready to proceed with the proof of the first inequality in (8).
Proof.
We consider only f such that I(f)<∞; otherwise, the required inequality is trivial. Let us fix a function y corresponding to f by the following convention: it is given by identity (14) up to the time moment t=ζT(f) and follows a zero-energy trajectory afterward, that is, satisfies
a.e. w.r.t. to the Lebesgue measure. We note that at least one such zero-energy trajectory exists (it may be nonunique, and in this case, we just fix one of such trajectories). Indeed, by construction, ˜a is piecewise constant, so that the corresponding ˆa is piecewise constant as well. The proper choice of ˆa(zk) at those points zk where ˜a(zk−)>0,˜a(zk+)<0 yields that the above ODE, which determines a zero-energy trajectory, admits at least one solution.
If f spends zero time in the set Δυ of discontinuity points for υ, then the same property holds for the corresponding y constructed above. Indeed, the first part of the trajectory y is just the time-changed trajectory f, and the second part is a zero-energy trajectory. The latter trajectory is piecewise linear, and we can separate a finite set of time intervals where it either (a) moves with a constant speed ≠0 (and thus spends a zero time in the set Δυ, which has zero Lebesgue measure) or (b) stays constant (in this case, it equals zk for some k, and, by construction, υ is continuous at {zk}). Hence, we conclude that (16) holds and, moreover, Jtail(y)=0, that is, J(y)=I(f). In addition, y∈ΛF, which gives for this f the required inequality
For a general f, we will show that, for each δ>0, there exists fδ such that fδ∈Bδ(f), I(fδ)≤I(f)+δ, and fδ spends zero time in Δυ; since Ilower is known to be lower semicontinuous, this will complete the proof of the first inequality in (8). Recall the decomposition I=I1+I2+I3 from Section 2.2 and note that I2(fn)→I2(f) if fn→f in the uniform distance and I1(fn)→I1(f) if fn→f in the distance
Hence, our aim is to construct a function fδ that is close to f both in the uniform distance and in dΣ, spends zero time in the set Δυ, and
We decompose the time set
into a disjoint union of open intervals and modify the function x on each of these intervals. On the complement to this union, the function fδ will remain the same; note that υ is continuous at every point zk, and hence in order to get a function that spends zero time in Δυ, it suffices to modify f on Q only. In what follows, we fix an interval (α,β) from the decomposition of the set Q and describe the way to modify f on (α,β). The construction below is mostly motivated by (13). We fix some γ>0 and choose a finite partition {uj} of the set {ft,t∈[α,β]} such that the oscillation of the function ˉσ2 on each interval (uj−1,uj) does not exceed γ. Then there exists a finite partition α=t0<⋯<tm=β such that, on each time segment [ti−1,ti], the function x visits at most one point from the set {uj}. Then, on each interval [ti−1,ti], we consider the family
where ϕi is a function such that
and si is defined by the following convention: si=+1 if ˉB is right-continuous at the (unique) point from the set {uj} that is visited by f on [ti−1,ti] or if f does not visit this set; otherwise, si=−1. If, in addition, ˙ϕit≠0 a.e., then for all κ>0 except at most countable set of points, we have that fi,κ spends zero time in Δυ on the time interval; see [13], Lemma 2. The choice of the sign si yields that, for κ>0 small enough,
Then κ>0 can be chosen small enough and the same for all intervals [ti−1,ti], so that the corresponding function ˜fκ, which coincides with fi,κ on [ti−1,ti], satisfies
It is also easy to see that, in addition, the following inequalities can be guaranteed by the choice of (small) κ:
Repeating the same construction on each interval from the partition for Q, we get a function ˜f such that
and \tilde{f} spends zero time in \varDelta _{\upsilon }. Taking in this construction \gamma >0 small enough, we obtain the required function {f}^{\delta }=\tilde{f}, which completes the proof of (17). □
Recall that B and \bar{B} satisfy (13). For the similar pair of functions \tilde{B}={\tilde{a}}^{2}/{\tilde{\sigma }}^{2} and \hat{B}={\hat{a}}^{2}/{\hat{\sigma }}^{2}, we have even more: the functions \hat{a},\hat{\sigma } are constant on each of the intervals (-\infty ,z_{1}),\dots ,(z_{m},\infty ); hence,
On the other hand, since \tilde{a}=a{\upsilon }^{2} and \tilde{\sigma }=\sigma \upsilon , we have B=\tilde{B}{\upsilon }^{-2}, and thus
This yields, for z\notin \{z_{k}\},
Recall that υ is continuous at each point z_{k}; hence, by (15) identity (18) holds for all z\in \mathbb{R}.
Now we are ready to proceed with the proof of the second inequality in (8).
Proof.
Assuming (19) to fail for some f, we will have sequences \{{y}^{n}\},\{{\tilde{y}}^{n}\} such that \{\tilde{y}_{n}\}\subset \varLambda _{F},
F\big({\tilde{y}}^{n}\big)\to f,\hspace{2em}\big\| {y}^{n}-{\tilde{y}}^{n}\big\| \to 0,\hspace{1em}\text{and}\hspace{1em}\underset{n}{\limsup }J\big({y}^{n}\big)<I(f).
Then \{{y}^{n}\} belongs to some level set \{J(y)\le c\} of a good rate function J. Hence, passing to a subsequence, we can assume that both \{{y}^{n}\} and \{{\tilde{y}}^{n}\} converge to some y\in \mathbb{Y}. In addition, J(y)\le \liminf _{n}J({y}^{n})<I(f).Next, denote
where \tau (\cdot ) is the function introduced in the definition of F. Then each {\tau }^{n}\in \mathit{AC}(0,T) with its derivative taking values from [{C}^{-2},{c}^{-2}]; see (4). This allows us, passing to a subsequence, assume that there exists a uniform limit \tau =\lim _{n}{\tau }^{n} and that {\dot{\tau }}^{n}\to \dot{\tau } weakly in L_{2}(0,T).
Observe that
Now we will use (20) in order to compare J_{i}, i=1,2,3, with I_{i}(f), i=1,2,3. We have directly that J_{2}=I_{2}(f). Next, we change the variables s=\tau _{t}, and get
Then by (18) we get
\underset{t\in [0,T]}{\sup }\big|F\big({\tilde{y}}^{n}\big)_{t}-y_{{\tau _{t}^{n}}}\big|=\underset{t\in [0,T]}{\sup }\big|{\tilde{y}_{{\tau _{t}^{n}}}^{n}}-y_{{\tau _{t}^{n}}}\big|\to 0.
Thus,
Then we have a representation for the part of the trajectory y similar to (14):
We observe that
J(y)\ge \frac{1}{2}{\int _{0}^{\tau _{T}}}\frac{{(\dot{y}_{s}-\hat{a}(y_{s}))}^{2}}{{\hat{\sigma }}^{2}(y_{s})}\hspace{0.1667em}ds
and give a decomposition for the latter integral, similar to (9). Recall that the function \hat{a}/{\hat{\sigma }}^{2} coincides with \bar{a}/{\bar{\sigma }}^{2} at each point except the finite set \{z_{k}\}. Then
S(x)={\int _{0}^{x}}\frac{\bar{a}(z)}{{\bar{\sigma }}^{2}(z)}\hspace{0.1667em}dz={\int _{0}^{x}}\frac{\hat{a}(z)}{{\hat{\sigma }}^{2}(z)}\hspace{0.1667em}dz,\hspace{1em}x\in \mathbb{R},
and thus
(21)
J(y)\ge \frac{1}{2}{\int _{0}^{\tau _{T}}}\frac{{(\dot{y}_{s})}^{2}}{{\hat{\sigma }}^{2}(y_{s})}\hspace{0.1667em}ds+\big[S(y_{\tau _{T}})-S(y_{0})\big]+\frac{1}{2}{\int _{0}^{\tau _{T}}}\frac{{\hat{a}}^{2}(y_{s})}{{\hat{\sigma }}^{2}(y_{s})}\hspace{0.1667em}ds=:J_{1}+J_{2}+J_{3}.
J_{3}=\frac{1}{2}{\int _{0}^{T}}\frac{{\hat{a}}^{2}(f_{t})}{{\hat{\sigma }}^{2}(f_{t})}\dot{\tau }_{t}\hspace{0.1667em}dt.
Recall that we assumed \dot{\tau } to be the L_{2}-weak limit of
On the other hand, {\tilde{y}_{{\tau _{t}^{n}}}^{n}}\to f_{t}, and then it is easy to show that, for a.a. t\in [0,T],
(22)
\frac{1}{\max ({\upsilon }^{2}(f_{t}-),{\upsilon }^{2}(f_{t}+))}\le \dot{\tau }_{t}\le \frac{1}{\min ({\upsilon }^{2}(f_{t}-),{\upsilon }^{2}(f_{t}+))}.
J_{3}\ge \frac{1}{2}{\int _{0}^{T}}\frac{{\hat{a}}^{2}(f_{t})}{{\hat{\sigma }}^{2}(f_{t})}\frac{1}{\max ({\upsilon }^{2}(f_{t}-),{\upsilon }^{2}(f_{t}+))}\hspace{0.1667em}dt=\frac{1}{2}{\int _{0}^{T}}\bar{B}(f_{t})\hspace{0.1667em}dt=I_{3}(f).
Finally, changing the variables s=\tau _{t}, we get
J_{1}=\frac{1}{2}{\int _{0}^{T}}\frac{{(\dot{f}_{t})}^{2}}{{\hat{\sigma }}^{2}(f_{t})\dot{\tau }_{t}}\hspace{0.1667em}dt.
Denote Q=\{t\in [0,T]:f_{t}\in \varDelta _{\upsilon }\} and recall that because \varDelta _{\upsilon } has zero Lebesgue measure, \dot{f}_{t}=0 for a.a. t\in Q. On the other hand, if f_{t}\notin \varDelta _{\upsilon }, then by (15) and (22) we have
thus,
J_{1}=\frac{1}{2}\int _{[0,T]\setminus Q}\frac{{(\dot{f}_{t})}^{2}}{{\hat{\sigma }}^{2}(f_{t})\dot{\tau }_{t}}\hspace{0.1667em}dt=\frac{1}{2}\int _{[0,T]\setminus Q}\frac{{(\dot{f}_{t})}^{2}}{{\bar{\sigma }}^{2}(f_{t})}\hspace{0.1667em}dt=I_{1}(f).
Summarizing the above, we get J(y)\ge I(f), which contradicts to the assumption made at the beginning of the proof. □3.5 Completion of the proof: general a,\sigma
In this last part of the proof, we remove the assumption a/{\sigma }^{2} to be piecewise constant and prove the required statement in the full generality. According to Section 2.1, it suffices to prove that, for fixed f\in C(0,T) and \varkappa >0,
for some \delta >0 and
for each \delta >0. While doing that, we can and will assume that a,\sigma are constant on (-\infty ,R] and [R,\infty ) for some R.
(23)
\underset{\varepsilon \to 0}{\limsup }{\varepsilon }^{2}\log \mathbf{P}\big({X}^{\varepsilon }\in B_{\delta }(f)\big)\le -I(f)+\varkappa(24)
\underset{\varepsilon \to 0}{\liminf }{\varepsilon }^{2}\log \mathbf{P}\big({X}^{\varepsilon }\in B_{\delta }(f)\big)\ge -I(f)-\varkappaConsider, together with the original SDE (1), the SDE
where α is a bounded function to be specified later. Then by the Girsanov theorem
(25)
d{Y_{t}^{\varepsilon }}=\big[a\big({Y_{t}^{\varepsilon }}\big)+\alpha \big({Y_{t}^{\varepsilon }}\big)\sigma \big({Y_{t}^{\varepsilon }}\big)\big]\hspace{0.1667em}dt+\varepsilon \sigma \big({Y_{t}^{\varepsilon }}\big)dW_{t},\hspace{1em}{Y_{0}^{\varepsilon }}=x_{0}\in \mathbb{R},
\mathbf{P}\big({X_{\cdot }^{\varepsilon }}\in A\big)=\mathbf{E}1_{{Y}^{\varepsilon }\in A}{\mathcal{E}_{T}^{\varepsilon }},
{\mathcal{E}_{T}^{\varepsilon }}:=\exp \Bigg[-{\varepsilon }^{-1}{\int _{0}^{T}}\alpha \big({Y_{t}^{\varepsilon }}\big)\hspace{0.1667em}dW_{s}-\frac{{\varepsilon }^{-2}}{2}{\int _{0}^{T}}{\alpha }^{2}\big({Y_{t}^{\varepsilon }}\big)\hspace{0.1667em}ds\Bigg];
see [10], Chapter IV, Theorem 4.2. Then, for arbitrary p,q>1:1/p+1/q=1, we have
\mathbf{P}\big({X_{\cdot }^{\varepsilon }}\in A\big)\le \mathbf{P}{\big({Y_{\cdot }^{\varepsilon }}\in A\big)}^{1/p}{\big(\mathbf{E}{\big({\mathcal{E}_{T}^{\varepsilon }}\big)}^{q}\big)}^{1/q}.
Let |\alpha (x)|\le \gamma . Then we have
\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbf{E}{\big({\mathcal{E}_{T}^{\varepsilon }}\big)}^{q}& \displaystyle =\mathbf{E}\exp \Bigg[-q{\varepsilon }^{-1}{\int _{0}^{T}}\alpha \big({Y_{t}^{\varepsilon }}\big)\hspace{0.1667em}dW_{s}-q\frac{{\varepsilon }^{-2}}{2}{\int _{0}^{T}}{\alpha }^{2}\big({Y_{t}^{\varepsilon }}\big)\hspace{0.1667em}ds\Bigg]\\{} & \displaystyle =\mathbf{E}\exp \Bigg[\big({q}^{2}-q\big)\frac{{\varepsilon }^{-2}}{2}{\int _{0}^{T}}{\alpha }^{2}\big({Y_{t}^{\varepsilon }}\big)\hspace{0.1667em}ds\Bigg]{\mathcal{E}_{T}^{q\varepsilon }}\le {e}^{({q}^{2}-q){\gamma }^{2}{\varepsilon }^{-2}/2}\mathbf{E}\hspace{0.1667em}{\mathcal{E}_{T}^{q\varepsilon }}.\end{array}
Since α is bounded, {\mathcal{E}}^{q\varepsilon } is a martingale. Thus, we can summarize the above calculation as follows:
In what follows, we will choose α such that the function (a+\alpha \sigma )/{\sigma }^{2} is piecewise constant. Then the result proved in the previous section will provide that, for given f\in C(0,T) and \kappa >0, there exists \delta >0 such that
where \tilde{I} is the rate functional that corresponds to the new drift coefficient a+\alpha \sigma and the same diffusion coefficient. It is easy to verify using the representation (9) and its analogue for \tilde{I} that we can choose γ small enough so that the above construction with arbitrary α such that \| \alpha \| \le \gamma yields
In that case, we get
(26)
\underset{\varepsilon \to 0}{\limsup }{\varepsilon }^{2}\log \mathbf{P}\big({Y}^{\varepsilon }\in B_{\delta }(f)\big)\le -\tilde{I}(f)+\frac{\kappa }{4},
\underset{\varepsilon \to 0}{\limsup }{\varepsilon }^{2}\log \mathbf{P}\big({X}^{\varepsilon }\in B_{\delta }(f)\big)\le -\frac{1}{p}I(f)+\frac{\kappa }{4p}+\frac{(q-1){\gamma }^{2}}{2}.
Now we are ready to summarize the entire argument. For given f\in C(0,T) and \kappa >0, we take p>1 close enough to 1 such that
Then we take \gamma >0 small enough such that (27) holds and
Observe that the function a/{\sigma }^{2} has left and right limits at every point and is constant on (-\infty ,R] and [R,\infty ). Then it can be approximated by piecewise constant functions in the uniform norm. This means that we can find α with \| \alpha \| \le \gamma such that the function (a+\alpha \sigma )/{\sigma }^{2} is piecewise constant. Then there exists \delta >0 such that (26) holds, and we finally deduce
\underset{\varepsilon \to 0}{\limsup }{\varepsilon }^{2}\log \mathbf{P}\big({X}^{\varepsilon }\in B_{\delta }(f)\big)\le -I(f)+\frac{\kappa }{4}+\frac{\kappa }{4p}+\frac{\kappa }{4}+\frac{\kappa }{4}<-I(f)+\kappa ,
which completes the proof of (23).Exactly the same argument provides the proof of (24) as well, with the minor change that now the law of {Y}^{\varepsilon } should be expressed in the terms of {X}^{\varepsilon }; that is, we should use the Girsanov theorem in the following form:
The rest of the proof remains literally the same; we omit the detailed exposition.