MathematicS In Action

We establish a connection between two population models by showing that one is the scaling limit of the other, as the population grows large. In the infinite population model, individuals are split into two subpopulations, carrying either a selective advantageous allele, or a disadvantageous one. The proportion of disadvantaged individuals in the population evolves according to the Λ-Wright–Fisher stochastic differential equation (SDE) with selection, and the genealogy is described by the so-called Bolthausen–Sznitman coalescent. This equation has appeared in the Λ-lookdown model with selection studied by Bah and Par-doux [1]. Schweinsberg in [16] showed that in a specific setting, due to the strong selection, the genealogy of the so-called Moran model with selection converges to the Bolthausen–Sznitman coalescent. By splitting the population into two adversarial subgroups and adding a weak selection mechanism, we show that the proportion of disadvantaged individuals in the Moran model with strong and weak selections converges to the solution of the Λ-Wright–Fisher SDE of [1].


Introduction
The Moran model is a classical model in population genetics.It describes the evolution in continuous time of a haploid population with constant size, where generations are overlapping.Every individual dies at rate 1 and is instantaneously replaced by a copy of an individual chosen uniformly at random in the remaining population, including the individual who just died.It is well-known that the genealogy of the Moran model is described by the so-called Kingman's coalescent, which is the only exchangeable 1 coalescent process where merging events are only binary and non-simultaneous.Some kind of universality of Kingman's coalescent for the genealogy of discrete time population models with fixed size was established in [12]; this result is known as Möhle's Lemma.In [13], Möhle also obtained convergence results towards different coalescents, and even allowed the size of the population to vary.
In the Moran model of size N , when the population is split into two subgroups, say the individuals carrying allele X and the ones carrying allele Y , the proportion (X t ) t≥0 of allele X in the whole population converges as N → ∞, when speeding up the time by a factor N , towards the Wright-Fisher diffusion, that is the solution to the SDE where W is a standard Brownian motion.Note the symmetry between the two alleles reflecting the fact that none of them has a selective advantage over the other.One of the implications is the well-known duality relation between the number of blocks in Kingman's coalescent and the Wright-Fisher diffusion, as stated in [2,Theorem 2.7].Namely, denoting K t the number of blocks in Kingman's coalescent at time t, it holds for all x ∈ (0 , 1) and k ∈ N that The duality actually holds for more general coalescents, namely Λ-coalescents, and some Fleming-Viot processes, as shown in [3,Equation (18)], that is, for K t the number of blocks at time t in a Λ-coalescent and (X t ) t≥0 the solution of some SDE.The coalescence rates in a Λ-coalescent are characterized by a finite measure Λ on [0 , 1], such that if the coalescent contains k blocks at some given time, any sub-family of size ℓ among the k blocks merge at rate given by In particular, the blocks are exchangeable, in the sense that all the possible combinations of ℓ blocks have the same rate of merging.Kingman's coalescent corresponds to the case Λ = δ 0 , the Dirac mass at 0. Another instance of Λ-coalescent is the Bolthausen-Sznitman coalescent introduced in [6], corresponding to Λ(dp) = dp.Its importance is due to its connections to models such as spin glasses, continuous branching processes, travelling waves, some population models, see e.g.[2] and references therein.The populations where we expect to observe the Bolthausen-Sznitman coalescent are for instance populations undergoing strong selection [16], exploring uninhabited territories [7], or quickly adapting to the environment [9,14].In those cases, an individual sometimes reproduces more (or faster) and generates a family of size of the same order as the population size.
Moran model with selection and Λ-lookdown model.When a death occurs in the Moran model, instead of choosing an individual uniformly at random to reproduce, we can include a selection component in the dynamics and choose the parent proportionally to its fitness.An instance of a Moran model with selection has been studied by Schweinsberg in [15] and [16], where the individuals accumulate beneficial mutations increasing their reproduction rates; this is the model we are interested in this work and we will describe it in more details later on.The main result of [16] establishes that the genealogy of the Moran model with selection of Schweinsberg converges towards the Bolthausen-Sznitman coalescent, as the size of the population goes to infinity.Let us connect it with another population model.
In [1], the authors study an infinite size population model called the Λ-lookdown model with selection, whose genealogy is that of the corresponding Λ-coalescent.We are of course interested in the Bolthausen-Sznitman case Λ(dp) = dp on [0 , 1].Each individual carries either allele X or allele Y , the selection advantaging the individuals of type2 X. Theorem 3.5 in [1] shows that the proportion of carriers of Y is the solution of the following SDE: where M is a Poisson point process with intensity ds ⊗ du ⊗ dp p 2 , and α ≥ 0 represents the selective advantage of X over Y .In [1], Equation (1.2) is called the Λ-Wright-Fisher SDE with selection.The previously mentioned duality (1.1) in this case is between the solution of (1.2) with α = 0 and the associated Λ-coalescent.One may wonder whether it is possible to split the individuals in the Moran model with selection into two adversarial subgroups (X versus Y ), in order to observe the convergence of the proportion of the disadvantaged group Y towards the solution of (1.2).The goal of this work is to answer this question3 .

Previous results
We describe more formally the Moran model with selection and the results of Schweinsberg in [15] and [16].
We consider a population of fixed size N ∈ N.Each individual dies at rate 1, meaning that its lifetime is an exponential random variable with parameter 1.At time 0, the N individuals carry no mutation.Each of them acquires a mutation that adds up to its current number of mutations at rate µ = µ N that can depend on N .We call the number of mutations carried by an individual its type.When a death occurs, say at time t, the individual is instantaneously replaced by a copy of an individual chosen in the population at time t, including the one who just died, independently from the past.The parent is chosen at random proportionally to its fitness at time t, as explained below, and the newborn individual then inherits the type of its parent.
For all j ≥ 0 and t ∈ R + , we denote by W j (t) the number of individuals of type j at time t in the population.The average number of mutations at time t is thus given by Let s = s N > 0 be the coefficient of selection and let the fitness of the type j at time t be max (1 + s(j − M (t)), 0).If a death occurs at time t, the probability that a particular individual of type j reproduces is , which becomes when all the fitnesses are positive.The neutral Moran model corresponds to the case where s = 0, that is all the individuals have the same probability to reproduce.We stress that thus defined, in our model, every mutation is beneficial.Define which are proven in [15] to be the scaling constants such that in a N units of time, the difference between the largest type at time t and the largest type at time t + a N is of order k N .The assumptions on the parameters of the model are the following: In particular, it implies that s → 0, k N , a N → ∞ as N → ∞, and for any a, b > 0, 1 We refer to [15] and [16] for more detailed discussions on these assumptions.We will recall later on the fact from [15] that the number of types that have appeared before time a N T is of order at most O(k N ) with high probability.Since sk N → 0 as N → ∞ by Assumption (A 3 ), the fitnesses for these types are always positive with high probability, that is, the fitnesses are given by (2.1).
The main results of Schweinsberg in [15] concern the dynamics of the type distribution in the population as N → ∞.Theorem 1.4 in [15] shows that after a N units of time, the distribution of the types starts looking like that of a Gaussian variable with vanishing variance.Theorem 1.2 in [15] states that M (a N t)/k N converges in probability and uniformly on compact sets of (0 , 1)∪ (1 , ∞) towards a function m.A similar convergence holds for the difference between the fittest individuals (the highest type alive) and the mean type, as shown in [15,Theorem 1.1], towards the map q : R + → R + that satisfies (2.4) This fully describes the dynamics of the type distribution as N → ∞, forward in time.It enabled Schweinsberg in [16] to show, when looking backward in time, that the genealogy of the process converges in finite-dimensional distributions towards the Bolthausen-Sznitman coalescent.
Following the fittest type.An important result in [16] is that, when sampling n individuals in the population at time a N T and looking backwards in time, after a N units of time, all the sampled individuals essentially share the same type with high probability and it goes on for their ancestors, the common type being the fittest (i.e.largest) type in the population.This can be rephrased as follows: after a N units of time forward, only the individuals that were among the fittests have begotten a non-negligeable offspring.In order to track the fittest type, Schweinsberg discretises the time at stopping times defined as follows: for all j ≥ 1, let In words, τ j is approximately the time when type j mutations start occurring, making j − 1 the fittest type in the population at time τ j ; see [15,Equation (3.16)] and the associated discussion.Note that s/µ → ∞ as N → ∞ by (2.3), but s/µ N → 0. Roughly speaking, it means that although the largest type represents a fraction of the whole population close to 0, once this type reaches a size ⌈s/µ⌉ := inf{n ≥ 1 : n ≥ s/µ}, it starts evolving in a very predictable way, which is the reason why this discretisation is powerful.In particular, it is when a type j mutation occurs relatively shortly after τ j that a large family is likely to descend from it, due to the fact that the fitness is relative to the mean (meaning that individuals mutating faster than usual are getting strongly advantaged for reproducing).To follow the largest type, we introduce the random index which is adapted to the natural filtration of the process (W j (t); j ≥ 0) t≥0 .We stress that the notation j(t) refers to another quantity in [15,16].

Adding the weak selection dynamics
Recall that we want to divide our population into two adversarial subgroups, say X and Y , giving a selective advantage to X such that the proportion of Y -individuals converges towards the solution of (1.2) as N → ∞.It is important to note that this new selection between groups X and Y should leave unchanged the selection between the different types.Henceforth, we will use the name type without further precisions to refer to the number of mutations carried by an individual, never for his group X or Y alone.Nonetheless we will sometimes use the condensed type (Y, j) to refer to both the group and type of an individual.For technical reasons due to the fact that the population takes about a N units of time to reach the Bolthausen-Sznitman dynamics, we study the proportion of Y -individuals starting only from time τ j (2) , when the types' distribution already looks like a Gaussian distribution.Let (y N ) N ∈N be a sequence in (0 , 1) such that y N → y ∈ (0 , 1) as N → ∞ and y N ⌈s/µ⌉ ∈ {1, . . ., ⌈s/µ⌉}.At time τ j(2) , we mark uniformly at random exactly y N ⌈s/µ⌉ type j(2) − 1 individuals to be in the group Y .Each individual of type j < j(2) − 1 is marked with probability y N .All the individuals in the population that are not in the Y group form the X group.The usual reproduction mechanism is left unchanged by the membership of X or Y .During a reproduction event, the child inherits the group of its parent.
Recall the definition of the map q in (2.4).Starting from some deterministic integer k * N defined above the forthcoming equation (3.7), Schweinsberg introduced deterministic times approximating the random times τ j , j ≥ k N * +1, recursively defined by τ * We then define We add a new selection mechanism, that we call weak selection, operating between groups X and Y as follows.Set the weak selection coefficient α ≥ 0, that does not depend on N .Let Y j (t) be the number of (Y, j)-individuals at time t for j ≥ j(2), and define X j (t) similarly for the (X, j)-individuals.Every time a (Y, j)-individual acquires a j + 1-th mutation, say at a time t, it is instead killed with probability Each killing is immediately compensated by choosing an individual uniformly at random among the X j (t) (X, j)-individuals to give birth to a (X, j + 1)-individual.Note that the dynamics of the types (i.e.number of mutations) thus remain unchanged.Let (Y N t ) t≥2 be the càdlàg version of the process which, informally, follows the proportion of Y -individuals among the fittest ones, that is Note that the process (Y N t ) t≥2 is not Markovian, due to the interactions between the types.Moreover, the individuals counted in Y j (τ j+1 ) need not be children of the individuals in Y j−1 (τ j ) We now state our main result.
The strong uniqueness of the solution of (1.2) is proven in [8,Theorem 4.1].
Random scaling of the weak selection.The killing probability (2.9) involves deterministic scaling factors r j , j ≥ k * N + 1.Throughout the paper, instead of deriving approximations of r j in every intermediate result, we shall rather use specific random factors to simplify the proofs; when establishing the convergence of Theorem 2.1, we will only need to make the approximation of r j once.We redefine the weak selection mechanism by replacing (2.9) with where q j+1 is a random variable, defined later on in (3.8), that can be understood as the difference between the largest type and the average type in the population at the time when the type j + 1 start appearing.It is measurable with respect to the natural filtration of the population at this time and we suppose independence of these killings with all randomness after this time.We will see in the forthcoming Lemma 3.2 that 1/q j+1 appears naturally as a scaling of the weak selection in order to observe a non-trivial limit as N → ∞, and that r j ≈ q j in Equation (4.5), such that the two probabilities (2.9) and (2.10) are, in some sense, asymptotically equivalent.
Organisation of the paper.
• Section 3 contains technical tools for the proof of Theorem 2.1, divided into three subsections: (1) The first one recalls notation and results from [15] and [16].Proposition 3.1 describes the evolution of the types and related quantitites, Lemma 3.2 controls the time lapse between the random discretization step τ j+1 − τ j defined in (2.5).Sometimes, an individual will have a much larger than usual number of descendents born between these times.Lemma 3.3 approximates the law of the size of such a large family.(2) The second subsection introduces some notation and two new populations whose dynamics need to be described.Informally, the first population is identical to our model except that the most recent killings are cancelled, whereas the second population consists of (X, j)-individuals descending from a recent killing.They are of interest because one can retrieve the proportion of Y -individuals in the original population from these two.Lemma 3.4 shows that the weak selection does not interfere much with the strong selection, in the sense that with high probability, the same individual is not affected by the two mechanisms at the same time.
(3) The third subsection adapts the techniques of Schweinsberg based on martingales to investigate the fluctuations of different subpopulations.It is divided into 5 subsubsections, whose organisation is made precise at the beginning of the subsection.Lemmas 3.5 and 3.6 are technical results that serve to obtain the approximation of the (Y, j − 1)-individuals when the (Y, j)-individuals start appearing in the population; this approximation is made in Lemma 3.7.Lemma 3.8 contains tools to control the numbers of Y -individuals as well as the number of individuals descending from killings.They will be used to derive Lemma 3.9 that describes the evolution of the proportion of Y -individuals when one individual in the population reproduces much more that the others, due to the strong selection.Lemma 3.10 shows that, in expectation, in the absence of weak selection and as long as no type j individual appears too close to time τ j , the proportion of Y -individuals remains constant.The expectation of the effect of the weak selection on the proportions is obtained in Lemma 3.11.
• Section 4 is devoted to the proof of Theorem 2.1.To show the convergence of the process when the killing probability is given by (2.10), the strategy is the following.We first establish the tightness of Y N in Lemma 4.1.Next, we show in Lemma 4.2 that the expectation of the increment of the proportion of Y -individuals from j to j + 1 is very close to the generator of the solution of (1.2).We then introduce a martingale problem in Lemma 4.3 which states that any weak limit of Y N solves it.We continue the argument with Lemma 4.4, who states that this weak limit is therefore a solution of (1.2).The section ends with the extension of the convergence to the process when the killing probability is given by (2.9), through a coupling.

Schweinsberg's setting and notation
In this subsection, we introduce the notation used in [15,16] and we recall some of the results we will need.Thus, what follows does not directly concern the dynamics of the two groups X and Y , but rather that of the type distribution.Set T > 2 a positive real number, arbitrarily large.Fix ϵ, δ ∈ (0 , 1) such that as required in [16,Equation (5.1)].We will study the process up to time a N T , and control its behaviour with a probability greater than 1 − ϵ, with accuracy δ.We shall denote by C i , i ∈ N, constants that can depend on δ, ϵ, T whereas C will always refer to a constant independent of those parameters, that may vary from line to line.We will also keep C independent from the weak selection coefficient α of (2.10).
We introduce some tools to study the evolution of a type.Denote B j (t), D j (t) respectively the birth-rate and death-rate per individual of type j at time t ≤ a N T , that is: and define The value of G j (t) is the growth-rate per individual of type j at time t.Thus, as in [15] and [16], we can define for all j ≥ 0 Let (F N t ) t≥0 denote the natural filtration of (W j (t), j ≥ 0) t≥0 .For all j ≥ 0, the process (Z j (t)) t∈[0 ,a N T ] is a square integrable martingale with respect to F N , the variance of which is given for t ∈ [0 , a N T ] by see [15,Proposition 5.1].The role of Z j (t) is to control the fluctuations of W j (t) as follows: we rewrite (3.5) as .
Then one sees that if Z j (t) is much smaller that e − t 0 G j (v)dv , then describing W j (t) reduces to describing e t 0 G j (v)dv and W j−1 (u) up to time t.To show that Z j is small with high probability, the general strategy is to bound its variance given by (3.6).Roughly speaking, Z j is a martingale because e t 0 G j (v)dv is the expected number of individuals alive at time t in a pure birth process starting from a single individual.Hence, one sees that e − t 0 G j (v)dv W j (t) would be constant in expectation, in the absence of immigration by mutations.The integral in (3.5) is exactly the term needed to compensate these mutations and their offspring.The variance in (3.6) follows from stochastic calculus, as shown in [15,Section 5].
We will often work with variants of the martingales Z j .We will always admit the fact that they are martingales, the reasons being the same as the one sketched above, as well as the formulas for their variances.
In [15,16], Schweinsberg often distinguishes whether j is greater or smaller than k * where This constant is, roughly speaking, the first type after which the type distribution looks like a Gaussian distribution.For j ≥ k * N + 1, let As we mentioned before, all the individuals have type 0 at time 0 and the wave dynamics starts approximately around time a N , which is why the condition in the definition of q * j is needed in [15,16].However, we will be mainly interested in types j ≥ j(2) for which q * j = j − M (τ j ).Recall that τ j defined in (2.5) is approximately the time where one expects to see the first type j mutations, hence q j is an approximation of the difference between j and the average number of mutations when individuals of type j start appearing.Set For j ≥ k * N + 1, define We shall work on a specific event, realized with high probability, such that it holds that τ j < ξ j < τ j+1 for all j ≥ k * N + 1 such that τ j+1 < a N T .The goal of ξ j is to distinguish whether a mutation is faster than usual: we call a type j mutation an early type j mutation if it occurs in the time interval [τ j , ξ j ].The fitness being relative to the mean, the earlier a mutant is, the stronger is its advantage to reproduce immediately after the mutation.The individual acquiring an early type j mutation, as well as its offspring, are called early type j individuals.In general, we will speak of early type j individuals during the time interval [τ j , τ j+1 ] such that (on the high probability event we consider) they still have type j when thus called.Schweinsberg showed that large families appear with the Bolthausen-Sznitman rates as a result of early mutations.
In [15,16], ζ = ζ N denotes a stopping time that is essentially the first time when the type distribution behaves atypically, i.e. differently from its large population limiting behavior.For example, it is defined such that before ζ and for some time after τ j , the dynamics of the number of type j individuals has a specific exponential growth, see Proposition 3.1 (3) below).Its definition requires a lot of technical considerations that are not relevant for our purposes and we will merely recall the properties that we will need and that hold before time ζ.The precise definition of ζ is given in [15,Section 3.3].In particular, for N large enough, it holds that Throughout the paper we will say that a property holds on some event E if it is true for P-almost every ω ∈ E. Similarly, if we say that on the event E, E(β) ≤ c for some random variable β and some constant c, we mean that E(β|E) ≤ c.
We shall work on {ζ > a N T } so that the properties of the next proposition hold.Besides, the listed properties in Proposition 3.1 below always hold under the conditions given in their respective statements, e.g. for all The results it gathers are from [15] and [16] as follows: • (1) is taken from both Proposition 3.3 (1) and Proposition 3.6 (3) in [15].[15].
• (5) is taken from Lemma 4.5 in [15].Proposition 3.1.Recall that k * N + 1 defined above Equation (3.7) is the first type we are interested in, and for all j ≥ 1, τ j defined in (2.5) is roughly the time when type j start appearing in the population.The real numbers ϵ, δ ∈ (0 , 1) do not depend on N and were fixed such that (3.1) is satisfied.For N large enough, the following hold: (1) For all j ≥ k * N +1 such that τ j+1 ≤ ζ ∧a N T , no early type j individual acquires a j +1-th mutation before time τ j+1 .Furthermore, it holds that and on {ζ > a N T }, we have τ J+1 > a N T for J := 3T k N + k * N + 1, so the types greater or equal to J + 1 have not appeared at time a N T yet.
, where W j denote the number of non-early type j.Moreover, the upper bound holds for where ξ j was defined in (3.10). ( for some constant C 1 > 0. ( As explained in the introduction, it will be more convenient for us to study the process starting at time τ j (2) .We also note that by Proposition 3.1 (1) above and Assumption (A 3 ), on the event {ζ > a N T }, all the fitnesses of the individuals until time a N T are positive, i.e. (2.1) holds, and therefore where of the above proposition.Moreover, for N large enough, for all t ≤ a N T and every j ≤ J, on the event {ζ > t}, one has by assumption (A 3 ).
In [16], the study of the process backwards in time requires to consider only types j's that belong to some set I ⊂ N, defined just before Lemma 6.2 in [16].Its definition involves a fixed parameter t 0 ∈ (T − 37 , T − 2).Choosing t 0 = T − 3, one gets ) , where the τ * j 's are some deterministic times, approximating the random τ j 's (see [16, Equation (6.1)]).The relevant informations for our purposes are given by Lemma 6.2 in [16], which shows that on the event {ζ > a N T }, it holds that τ j 1 < 2a N , and j 2 ≥ L + 9, where L is defined in Lemma 5.1 of [16] as It entails that τ j 2 ≥ τ L + 9a N /3k N by Proposition 3.1 (1).Hence, τ j 2 > a N (T − 1).We thus have that on the event {ζ > a N T }, j(2) ∈ I and j(T − 1) ∈ I, so that for j(2) ≤ j ≤ j(T − 1), we can use the results of Schweinsberg proven for j ∈ I, since then 2a N ≤ τ j ≤ a N (T − 1).In particular, on the event {ζ > a N T }, the estimates in Proposition 3.1 hold for j ∈ I and we will thus apply the proposition for j ∈ I without recalling that this ensures τ j ≤ a N T .
We deduce a result on the time length between τ j and τ j+1 that will be useful later on.
Lemma 3.2.For all j ∈ I, conditionally given F N τ j and on the event {ζ > τ j+1 }, it holds that Proof.By Proposition 3.1 (4), we know that sup t∈[τ j ,τ j+1 ] |G j (t) − sq j | ≤ sC 1 .Equation (8.32) in [16] states that Therefore we have that We conclude the first subsection of the toolbox with a reformulation of the result of Schweinsberg in [16] showing that the law of the number of early type j individuals at time τ j+1 can be well approximated by the rates corresponding to Bolthausen-Sznitman's coalescent.Lemma 3.3.For N large enough, for all j ∈ I, j ≥ j(2), conditionally given F N τ j and on the event {ζ > τ j+1 }, for any g ∈ C ∞ ([0 , 1]), it holds that where S j is the proportion of early type j individuals at time τ j+1 among the type j individuals and p S j its probability distribution supported on {0, 1/⌈s/µ⌉, . . ., 1}.
Proof.Let ν(dx) = dx/x 2 , x ∈ (0 , 1].Lemma 7.8 in [16] shows that for all y ∈ (ϵ , 1 − δ], it holds that We implicitely used that the event in [16,Equation (7.48)] has probability going to 1 as N → ∞, see Lemmas 7.4 and 7.7 of the same paper.Roughly speaking, on this event the early mutants are coupled with a branching process introduced in Section 7.2 of the same paper, allowing to approximate the law of S j .We write We conclude using that δ/ϵ < ϵ by (3.1).□

Splitting strategy to study the weak selection
Recall that, among the type j(2) − 1 individuals at time τ j(2) , we assigned y N ⌈s/µ⌉ of them to group Y , and (1 − y N )⌈s/µ⌉ to group X, with the weak selection mechanism explained above Theorem 2.1.One sees when α = 0 that is exactly the model of Schweinsberg.Moreover, when α ̸ = 0, the type distribution remains unchanged (only the genealogy is altered).To make the proofs, we will study the fluctuations of each group as if there was no weak selection and then combine it with estimates on the number of killings.We thus introduce the notation Yj (t) for j ∈ I, j ≥ j(2) and t ∈ [τ j , τ j+1 ] for the number of (Y, j)-individuals if we had cancelled the killings of the weak selection previously described between [τ j , τ j+1 ], and only those ones 4 .In particular, for t ∈ [τ j , τ j+1 ], denoting Xj (t) the total number of individuals at time t descending from killings between [τ j , t], one can write Y j (t) = Yj (t) − Xj (t).Hence, our strategy is to control Xj (t) and Yj (t) separately before combining them to obtain control on Y j (t).
To obtain (1.2), the weak selection should have asymptotically no effect on the strong selection.
Schweinsberg in [15] is able to couple the early type j individuals and their progeny with a branching process, for a certain amount of time.This allows him to bound the probability that an early mutation survives.Let E j be the event that a Y -individual is killed by the weak selection during an early type j mutation in [τ j , ξ j ], and that the resulting (X, j)-individual has descendents that are alive at time τ j+1 .We complete the filtration F N to take into account the groups X and Y of the individuals.The following lemma shows that these problematic events occur with negligible probabilities.Lemma 3.4.For all j ∈ I, j ≥ j(2), on the event {ζ > τ j }, it holds that Proof.By independence, the probability of E j is the product of the probabilities of a surviving early mutation and a killing, the former being upper bounded by Lemma 7.8 in [16] (this Lemma actually bounds the probability of survival up to a time τ ′ j , that is anyway smaller than τ j+1 on the event {ζ > τ j+1 }).By combining this bound and (2.10), we obtain as claimed.□ From Lemma 3.4, we see that as N → ∞, where we used that the number of elements in I is smaller than J ≤ 4k N T , see the discussion after (3.13).Hence, we redefine ζ to include the first time at which an event E j occurs and with this new definition, one can still choose N large enough such that P(ζ > a N (T − 1)) > 1 − ϵ, in particular no E j occurs for any j ∈ I with high probability.

Expected fluctuations of the proportions
We divide this section into 5 parts: in the first subsection, we will look at the effect of the weak selection on type (j − 1), that is the second fittest type during [τ j , τ j+1 ] when the fittest type j starts building up.Then, in the second subsection, we will study the non-early type ( Y , j)individuals during [ξ j , τ j+1 ] In the third subsection we will describe the impact of an early mutation on the proportion of Y individuals.In the fourth subsection, we will introduce the discrete process indexed by j following the proportion of (Y, j)-individuals at time τ j .The importance of weak selection, that is the expected number of killings of (Y, j)-individuals occurring in [τ j , τ j+1 ], will be discussed in the fifth subsection.
3.3.1.The type j − 1 during [τ j , τ j+1 ] Let Xτ j j−1 (t) be the number of (X, j − 1)-individuals at time t ≥ τ j descending from a killing that occurred after time τ j .In order to properly estimate Xj (τ j+1 ), one needs to control Xτ j j−1 (t) for all t ∈ [τ j , τ j+1 ], since type j individuals can come from mutants of these type j − 1 individuals.The next lemma enables us to do so.Lemma 3.5.For N large enough, on the event {ζ > τ j+1 }, for all j ∈ I, j ≥ j(2), with probability Proof.We admit the two following statements without proof, referring to [15, Section 5] for details on how to prove them: For all j ∈ I, j ≥ j(2), we have that • the process defined for t < τ j by Žτ j j−1 (t) = 0 and for t ≥ τ j by is a mean zero, square integrable martingale; • its conditional variance is given by The first step of the proof is to bound (3.16).Since X j−2 (t) and Y j−2 (t) are smaller than W j−2 (t) by definition, Proposition 3.1 (3) then (5) entail that where we used that allows us to write t∧ζ τ j ∧ζ e q j q j−1 .(3.17) On the other hand, B j−1 (u) + D j−1 (u) ≤ 3 for all u ≥ τ j−1 by (3.12).We apply Proposition 3.1 (4), then use (3.15) and the martingale property of Žτ j j−1 to write where the last inequality follows from (3.17).Thus, applying Proposition 3.1 (5) and coming back to (3.16), we have shown thanks to (3.17) and (A 3 ) that Proposition 3.1 (1) gives s(τ j − τ j−1 ) ≥ s a N 3k N = log(s/µ)/3k N , which tends to ∞ as N → ∞ as a consequence of assumption (A 2 ) and k N → ∞.In particular, on {ζ > τ j }, it holds that Therefore, Doob's maximal inequality for square integrable martingales yields that since µ ≪ s a for any a > 0 by (2.3).Then, we use Proposition 3.1 (3) to write Xτ 1 − δ du where we used Proposition 3.1 (5), we then obtain This and (3.19) together conclude the proof.□ The following lemma will allow us to control the fluctuations of the (Y, j + 1)-individuals by controlling that of the (Y, j)-individuals between [τ j+1 , τ j+2 ].Lemma 3.6.For all j ∈ I, j ≥ j(2), the following process is a square integrable martingale: Moreover, for N large enough, for all t ∈ [τ j+1 , τ j+2 ∧ ζ), one has the following upper bound for its conditional variance: Proof.Again, we admit that Z Y j is a square integrable martingale.We have α q j−1 W j−1 (u) ≤ α q j−1 , which tends to 0 as N → ∞ by Proposition 3.1 (4).One then gets the upper bound for the variance as a direct consequence of Lemma 9.27 in [15] (the process Z ′′ used in that Lemma is introduced in [15, p. 85, below Equation (9.82)]).□ The next lemma shows that the evolution of Y j−1 (τ j + t) until τ j+1 ∧ ζ remains predictable.
Lemma 3.7.For all j ∈ I, j ≥ j(2) + 1, on the event {ζ > τ j }, it holds that Furthermore, the same statement holds with X j−1 instead of Y j−1 .
Proof.The statements (2), ( 3), (4) of Proposition 3.1 hold up to time t on the event {ζ > t}.Write We shall bound the two last terms in the above parentheses.By Proposition 3.1 (3), one has e −s(u−τ j−1 ) , then using Proposition 3.1 (5), By (3.18), we thus have shown that for N large enough, on the event {ζ > t}, Furthermore, using Lemma 3.6 and Doob's maximal inequality for squared integrable martingales, one has which completes the proof of the statement for Y j−1 .
The proof of the statement for X j−1 is identical.□

The non-early individuals
We will need the following lemma to control the non-early individuals.
Lemma 3.8.For j ∈ I, j ≥ j(2), let W j be the process which counts the number of non-early individuals, i.e. that obtain a jth mutation during [ξ j , τ j+1 ) and their descendants of type j, and Z j the associated martingale, that is for all t ∈ [ξ j , τ j+1 ]: Then, its conditional variance at time τ j+1 satisfies Moreover, denoting (Y ′ j (t)) t∈[ξ j ,τ j+1 ] the process following the number of non-early ( Y , j)-individuals, the same upper bound holds for the martingale defined by Finally, denoting ( X′ j (t)) t∈[ξ j ,τ j+1 ] the number of non-early ( X, j)-individuals, the following process, defined for t ∈ [ξ j , τ j+1 ], is a mean zero square integrable martingale whose conditional variance at time τ j+1 satisfies Proof.We only show the bound of the variance of ŽX j since the other statements follow from Lemma 9.12 in [15] (the ideas of the proof therein are similar to those presented here).
On the event {ζ > ξ j }, the formula for the variance is Var We focus on the first term in the parentheses.Since both X j−1 (u) and Y j−1 (u) are smaller than W j−1 (u) by definition, Proposition 3.1 (3) shows that we then use Proposition 3.1 (4) and we get By Proposition 3.1 (4), we obtain that on the event ζ > τ j+1 , For the other term of the variance, by (3.12), we have We bound the term with X j−1 Y j−1 /W j−1 ≤ W j−1 using Proposition 3.1 (3) and we upper bound −G j (v) ≤ −s(q j − C 1 ) with the Proposition 3.1 (4).Integrating u from ξ j to ∞, we can use the martingale property of ŽX j to bound from above the right-hand side by We use that u ξ j dre −s(r−ξ j ) = 1 s (1 − e −s(u−ξ j ) ) to write where we used Proposition 3.1 (4) for the last inequality.The claim follows since we now bounded the two terms of the variance (3.22), the first one in (3.23), thus concluding the proof.□

After an early mutation
We now investigate the impact of an early mutation on the proportions.The next lemma essentially states that the difference between the proportion of ( Y , j − 1)-individuals among the type j − 1 individuals at time τ j to the proportion of (Y, j) among the type j individuals at time τ j+1 is mostly determined by the number of early type j individuals.
Lemma 3.9.Let j ∈ I, j ≥ j( 2) and let S be the proportion of early type j individuals at time τ j+1 .Conditionally given F N τ j , on the event {ζ > τ j+1 }, the following holds with probability at least 1 − Cϵ/k N : If S > 0, then the early individuals have the same ancestor at time τ j , and if it belongs to the group X, then

Similarly, if the ancestor belongs to the group Y , then
Proof.Recall the notation W j and Y ′ j for the processes following the non-early type j individuals, respectively the non-early type ( Y , j)-individuals.We write where Z ′ j is a martingale defined in Lemma 3.8.Using Proposition 3.1 (2) and Lemma 3.7, one has with probability at least We compute , and claim that this converges to 1 as N → ∞.Indeed, we first write e −s(ξ j −τ j ) = (sq j ) 1/q j e −b/q j , and use that on the event {ζ > τ j+1 }, Proposition 3.1 (4) shows that Taking the logarithm of the left-hand side, we get The second and third terms vanish as N → ∞ since k N → ∞, and this is also the case of the first term thanks to Assumption (A 1 ), hence the lower bound of (3.25) tends to 1 as N → ∞.

(3.26)
We then write where we have used (3.18).Coming back to (3.24), this gives Applying Lemma 3.8 and Doob's maximal inequality for squared integrable martingales, one has On the event {ζ > τ j+1 } a double application of Proposition 3.1 (4) gives (Since sk N → 0 as N → ∞, the constant C 1 has been absorbed in C, which does not depend on the parameters ϵ, δ, T .)Hence, where we used (3.1) and (3.9) for the last inequality (since we chose the constant T > 2, we replaced it by 2 in the denominator to absorb it in C).This result combined with (3.27) entails that, on the event {ζ > τ j+1 }, with a probability greater than 1 − Cϵ/k N : For the lower bound, the same reasoning as for the upper bound gives Since W (τ j+1 ) = (1 − S)⌈s/µ⌉, homogenizing the bounds, we get To conclude, conditionally given F N τ j , Lemma 7.5 in [16] bounds from above the probability that two early mutations survive by 2e 2b /q 2 j ≤ 3e 2b /k 2 N .Then, excluding this event, if there is an early mutation in the group X, easy calculations lead to The cases where the early mutant is a Y -individual is identical, which concludes the proof.□

The discrete proportions process
To ensure that our description of the evolution of the proportions is accurate enough, we introduce a stopped discrete time process as follows.For all j ∈ I, j ≥ j(2) − 1, we define ⌈s/µ⌉ , which follows the proportion of Y -individuals among the fittest at each time τ j stopped at the last type j before j(ζ/A N ) (recall that j(ζ/a N ) defined in (2.6) is the largest j such that τ j ≤ a N ζ).The reason to stop the process at j(ζ/a N ) is to ensure that the results above and in particular Proposition 3.1 apply to Y j .Similarly, we denote by YN j the process defined with Yj∧(j ), i.e. the process following the proportion of (Y, j)-individuals where the killings of the type j mutations have been cancelled (and only those ones).We stress that the event {ζ > τ j+1 } is included in the above definition, in the sense that the process stops evolving at the last τ j ≤ ζ, and this fact will be kept implicit when working with Y or Y.
We now give a lemma controlling the two first moments of the proportions' increments in the absence of weak selection, when there is no early mutation, or one that does not generate a too large family.Lemma 3.10.Let S j be the proportion of early type j individuals at time τ j+1 among the type j individuals (potentially, S j can be 0).For all j ∈ I, j ≥ j(2), it holds that Proof.Throughout the proof, we say population for the population of individuals for which the weak selection between the last time interval [τ j , τ j+1 ] has been cancelled (but for which the killings that occurred before τ j are kept).We will thus speak of type ( Y , j)-individuals.Fix j ∈ I, j ≥ j (2).We first note that We call a Y -individual of type j at time τ j+1 good if his ancestor at time τ j is of type j − 1.We denote by Y j (τ j+1 ) the number of good Y individuals at time τ j+1 , and K j , respectively K Y ,j the number of type j individuals in the population, respectively of type ( Y , j), at time τ j+1 that are not good.We have We pick an individual uniformly at random among the ⌈s/µ⌉ individuals of type j at time τ j+1 .Note that he belongs to the group of good Y -individuals if and only if his ancestor at time τ j is in the group Y with type j − 1, and we call this event B. Let j anc ∈ N be the type of his ancestor at time τ j .We have in particular that P(B|F N τ j , j anc = j − 1, ζ > τ j+1 ) = Y j−1 (τ j )/⌈s/µ⌉, since his ancestor is chosen uniformly at random among the ⌈s/µ⌉ individuals of type j − 1 at time τ j .Using that {ζ > τ j+1 } is F N τ j+1 -measurable, we then write Basic properties of probability measures entail that where the last inequality is from [16, Lemma 6.3] and, taking the logarithm and using Assumption (A 2 ), one can show that the bound is o(1/k N ).We turn our attention to the moment of order two.Suppose now that we independently sample two individuals, possibly the same, uniformly at random among the ⌈s/µ⌉ individuals of type j at time τ j+1 .Denote j anc and j ′ anc the types of their respective ancestors at time τ j and let B ′ be the event that they both belong to the good Y group.Let D be the event that the two ancestors are different with j anc = j ′ anc = j − 1. Recall that given D, the ancestors are exchangeable.In particular, given D, the ancestor of the first individual is chosen uniformly at random among the ⌈s/µ⌉ individuals of type j − 1 at time τ j , and then the ancestor of the second one is chosen uniformly at random among the ⌈s/µ⌉ − 1 that remain, the two ancestors being independent from S j .We get that Note that B ′ ∩ D c is included in the event that the two sampled individuals have the same ancestor of type j − 1 at time τ j .The probability to pick twice the same individual is 1/⌈s/µ⌉.In particular, Equation (8.16) in [16] implies that the probability that two type j individuals at time τ j+1 have the same ancestor at time τ j is bounded by Cϵ/k N , so that P(B ′ ∩ D c ∩ {ζ > τ j+1 , S j ≤ ϵ}|F N τ j ) ≤ Cϵ/k N .However, this probability of coalescence is not computed explicitly; we give the notation and arguments in order to read Equation (8.16) in [16] and deduce from it the bound for the probability of coalescence (we do not reprove the claim): N is a coalescent process that coincides at all times τ ℓ , ℓ ∈ I with high probability (see [16,Lemma 8.2] and use the bounds derived in its proof before summing over j ∈ I) with the coalescent Π N that describes the genealogy of the population, • Y j corresponds to our S j , • the event Ψ j is defined in the beginning of the proof of Lemma 8.8, with probability converging to 1 as N → ∞ thanks to Lemma 7.4 and 7.7 in [16].
Therefore, the bound in [16] Equation (8.16) holds true with Π N in place of Π * N and it follows that P(B ′ ∩ D c ∩ {ζ > τ j+1 , S j ≤ ϵ}|F N τ j ) ≤ Cϵ/k N .Hence, by (3.29), we obtain that as claimed, where we used the last inequality in (3.28).This concludes the proof.□

Weak selection
For Lemma 3.10 to be useful, it needs to be combined with a description of the number of type j individuals at time τ j+1 descending from killings between [τ j , τ j+1 ], that is Xj (τ j+1 ).Note that in Lemma 3.8, the martingale ŽX j is constructed from X′ j , namely the (X, j)-individuals coming from non-early killings, that is to say, killings occurring after time ξ j .Nonetheless, on the event {ζ > τ j+1 }, it holds that X′ j (τ j+1 ) = Xj (τ j+1 ), by definition of ζ, see the discussion following Lemma 3.4.Hence, the expected effect of the weak selection on the proportions from τ j to τ j+1 can be estimated: Lemma 3.11.For all j ∈ I, j ≥ j(2), on the event {ζ > τ j }, it holds that Proof.We address the first claim.Using the martingale ŽX j from Lemma 3.8, we can then combine Proposition 3.1 (3) and Lemma 3.7 and see that ⌈s/µ⌉ 2 e −s(ξ j −τ j ) − e −s(τ j+1 −τ j ) + O (δα) q j .
To prove the second claim, let S j be the number of early type j individuals alive at time τ j+1 .It suffices to show that the following is of order ϵ/k N : where we also got rid of 1 {S j =0} .Taking the expectation, we get where we used Proposition 3.1 (4) and (3.1) for the last inequality.It remains to bound the term with 1 {S j >0} in (3.31).By Cauchy-Schwarz's inequality, we can bound its expectation by . By Lemma 7.5 in [16], we know that P(S j > 0|F N τ j ) ≤ Ce b /q j , which can further be bounded by Ce b /k N using Proposition 3.1 (4).Therefore, using the third claim of the lemma (proved below), we see that the above is bounded by o(1/k N ).Using (3.31) with the first claim proves the second claim of the lemma.
We now show the last identity of the lemma.We simply note that elevating to the square then taking the expectation in (3.30) and using the same approximations as throughout the proof implies that The use of Proposition 3.1 (5) concludes the proof.□

Convergence towards the SDE.
In this section, our strategy to show the convergence is very common when showing convergence of a Markov process to the solution of a SDE: we first establish tightness, then look at the infinitesimal generator, and show that any weak limit solves a martingale problem associated to the SDE.The next lemma addresses the tightness of (Y Proof.The proof uses Aldous' criterion for tightness, stated e.g. in [10, Chapter VI Theorem 4.5].Let λ, θ > 0 and let σ, σ ′ denote any two stopping times with respect to the filtration F N , that are bounded by T − 1, and such that σ ≤ σ ′ ≤ σ + θ.Splitting the following probability on the events {ζ > a N T } and its complement entails that We rewrite the conditional expectation as where we used that (x + y) 2 ≤ 2x 2 + 2y 2 .Recall that on the event {ζ > a N T }, the number of j ≥ 1 such that a N σ ≤ τ j ≤ a N σ ′ is at most ⌈3k N θ⌉ by Proposition 3.1 (1).We bound the first sum above by We now bound the first sum of the right-hand side.Let S j be the number of early type j individuals at time τ j+1 .Using our Lemma 3.10 and Lemma 7.8 in [16], we have the following bound We next turn our attention to the double sum, we write therefore we have that by Lemma 3.10.We now bound the second sum in (4.2).Proceeding similarly as before, we obtain by Lemma 3.11.Hence, coming back to (4.1), we have shown that for all λ > 0, it holds that lim θ→0 lim sup Since the left-hand side does not depend on ϵ, its value is simply 0, which shows that the sequence is tight by Aldous' criterion for tightness.
Even though {Y N j ; j ∈ I, j ≥ j(2) − 1} is not Markovian, we shall mimic a classical method for showing convergence of a Markov process through its infinitesimal generator.Lemma 4.2.On the event {ζ > τ j+1 }, for all f ∈ C ∞ ([0 , 1]), all j ∈ I, j ≥ j(2) and N large enough, it holds that Proof.Let XN j := Xj (τ j+1 )/⌈s/µ⌉.We write where we used the Taylor-Lagrange formula.The second term is by using Lemma 3.11.We then focus on the first term.We have by Lemmas 3.11 and 3.10.We thus have shown that We address the last term as follows: let S j be the number of early type j individuals at time τ j+1 , we use Taylor-Lagrange Formula to get the existence of ξ strictly between YN j and Y N j−1 such that Therefore, applying Lemma 3.10, we see that We now turn our attention to the difference when an early mutation generates a large family.In the proof of Lemma 3.10, we introduced the notion of "good" type j individuals at time τ j+1 , that are those whose ancestor at time τ j is of type j − 1.We denoted the number of individuals of type j at time τ j+1 that are not good by K j .Recall that by Proposition 3.1 (1), we know that the ancestors at time τ j of these K j individuals are not of type greater or equal to j.On the other hand, Lemma 6.3 in [16] shows that E(K j 1 {ζ>τ j+1 } |F N τ j ) ≤ 5(s/µ) 1−1/3k N .Markov's Inequality thus yields that where the last estimate can be derived by taking the logarithm and using Assumption (A 2 ).Let p S j denote the conditional distribution of S j given F N τ j , supported on {0, 1/⌈s/µ⌉, . . ., 1}.Note that if an early mutation occurs as described in Lemma 3.9, on the event {τ j+1 < ζ} and given F N τ j , the individual who generates the large family, conditionally given that it is a good type j individual, is chosen uniformly at random among the ⌈s/µ⌉ individuals of type j − 1 at time τ j .Hence, the probability that the early individual is in group Y , respectively X, (given that there was an early type j mutation) is Y N j−1 , respectively 1 − Y N j−1 , up to a term of order o(1/ϵk N ), as discussed above.Thanks to Lemma 3.9, we can write where E is the error coming from the approximation in Lemma 3.9, and the probability that this approximation does not hold.In particular, we have that where we used Lemma 7.8 in [16], Proposition 3.1 (4), and that ϵ 3 > δ by (3.1).Lemma 3.3 allows us to write We leave to the reader the proof of the following bound: Recall (4.4), we thus have shown that Lemma 4.4 follows from Theorem 2.3 in [11], which addresses the question of when a solution of a martingale problem is also a solution of the associated SDE, for general Markov processes.For a more specific treatment of this question in our setting, the reader may read Section 3.3 of [1], that sketches an adaptation of an elegant duality argument from [4] (see proof of Lemma 1 therein).
Concluding the proof of Theorem 2.1.We have established the convergence of (Y N t ) [2 ,T −1] towards the solution of (1.2), when the killing probability is set to (2.10), the scaling factor q j+1 being a F N τ j+1 -measurable random variable.In order to prove Theorem 2.1, we only need to extend the result to the process with the probability of killing (2.9), where the factor r j+1 is deterministic.
By Lemma 6.1 in [16], for all j ∈ I, j ≥ j(2), on the event {ζ > τ j+1 }, it holds that where we recall the definition of ∆ j in (4.Let (Y N, * j ) j( 2)≤j≤j(T −1) be constructed from the population with probability of killing given by (2.9), analogously to (Y N j ) j( 2)≤j≤j(T −1) from the population with probability of killing (2.10).We now define two processes that bound from above and below the process (Y N, * j ) j( 2)≤j≤j(T −1) .Let η ∈ (0 , α) be arbitrary small.We construct (Y N,1 j ) j( 2)≤j≤j(T −1) from the same population as (Y N, * j ) j( 2)≤j≤j(T −1) as follows: for every killing happening, say, at the instant t of a type j-mutation, we cancel it with probability 1 − (α − η)r j αq j .
Note that for N large enough, the above is indeed positive thanks to (4.5) and Proposition 3.1 (4), and that this process has the same law as that of (Y N j ) j( 2)≤j≤j(T −1) , with weak selection coefficient α − η instead of α.Similarly, we construct (Y N,2 j ) j( 2)≤j≤j(T −1) by adding a killing at every time t of a type j-mutation such that no killing occurred at time t, with probability α + η q j − α r j X j (t) X j (t) + Y j (t) , and this process has the same law as that of (Y N j ) j( 2)≤j≤j(T −1) with weak selection coefficient α + η instead of α (again, the above is positive by (4.5)).Furthermore, by construction, for all j ∈ I, j(2) ≤ j ≤ j(T − 1), it holds that We know that (Y N,1 j ) j( 2)≤j≤j(T −1) , respectively (Y N,2 j ) j( 2)≤j≤j(T −1) , converges to the solution of (1.2) with α−η instead of α, respectively α+η instead of α.Since this is true for all η ∈ (0 , α) we deduce that the process (Y N j ) j( 2)≤j≤j(T −1) defined with the killing probability (2.9) converges to the solution of (1.2), thus proving Theorem 2.1.