MathematicS In Action

Sharp prediction of extinction times is needed in biodiversity monitoring and conservation management. The Galton–Watson process is a classical stochastic model for describing population dynamics. Its evolution is like the matrix population model where offspring numbers are random. Extinction probability, extinction time, abundance are well known and given by explicit formulas. In contrast with the deterministic model, it can be applied to small populations. Parameters of this model can be estimated through the Bayesian inference framework. This enables to consider various scenarios. We show how coupling Bayesian inference with the Galton–Watson model provides several features: (i) a flexible modelling approach with easily un-derstandable parameters (ii) compatibility with the classical matrix population model (Leslie type model) (iii) An approach which leads to more information with less computing iv) inclusion of expert or previous knowledge. . . It can be seen to go one step further than the classical matrix population model for the viability problem. To illustrate these features, we provide analysis details for a real life example with French Pyrenean brown bears.


Introduction
Habitat degradation is pointed out as the major cause of extinction for many threatened species around the world.Yet, for species that are exploited for food or other purposes (e.g., exploited fish populations, bushmeat, etc.), or purposely destroyed (e.g., large predators), extinction often occurs in intact habitats.For these cases, the assumption that population dynamics takes place in a fixed habitat seems reasonable.Then, assuming a given environment and a short period of time without brutal variation of the environment, are we able to propose a model to support decision making for management?
The simulations are frequently used to work on these questions even if some works highlight the use of Mathematical results for these problems as [11,15,20,25,32] among many others.
One of the most famous and powerful mathematical settings for Population Viability Analysis (PVA) is the Matrix population model.They were introduced by [3,26,27] and they have become a classical tool to predict the evolving size of a population as illustrated in the books [11,39].When the environment is assumed unchanged/stable, some issues listed hereafter are not currently being resolved successfully.We reformulated those issues in terms of elementary questions.
(I) What are the chances on saving a small population?How probable is extinction?For how long can the small population survive?
(II) What is the expected short-time evolution of the numbers of individuals?
(III) How can we proceed to avoid extinction [12,13], to maintain a population size [36] or wipe out some population [5,17,28].For instance, how many re-introductions are needed to reach a threshold of viability for the population with high probability?
We propose a reformulation of the matrix population model to be viewed as a Bayesian version of a Galton-Watson modelling that allows to address these issues.

Matrix Population Model
The state of the population, at a certain time t ∈ N = {0, 1, . . .}, is given by a (column) vector, N (t) = (N 1 (t), . . ., N K (t)) ⊺ , where K is the number of types (and the exponent ⊺ designs the usual transposed vector).This vector evolves from time t to t + 1 with the following matrix product: that is, for every type j,

.1)
Matrix A is called the population projection matrix and the entry A i,j tells how many individuals of type i appear per individual of type j.Classically the types correspond to the age (or stage or location), and A then codes the development (or the migration), the birth and the death.The matrix A is sometimes partitioned into two sub-matrices of probabilities: one for transitionsurvival (TS) plus one for recruitment (R) : N j (t + 1) = K i=1 TS i,j N i (t) + K i=1 R i,j N i (t).In this work, A includes the whole (transition, survival and recruitment).

Multiple Galton-Watson as a Matrix Population Model
Multi-type Galton-Watson processes, were introduced in [4,40], and can be seen as a matrix population model with random number of offspring (but with non-random environment).In this approach, at each time t, each individual l of type i gives a discrete random number ξ i,j,l,t ∈ {0, 1, 2, . . ., κ i,j } of children of type j.Here κ i,j ∈ N \ {0} represents the maximal number of offspring (of type j from an individual of type i); it is assumed to be finite.It is also assumed a stationarity of the process (i.e.parameters are not time varying) and independence between observation times and individuals.Denote N i,j (k, t) = N i (t) l=1 1 {ξ i,j,l,t =k} , then for t ∈ {0, . . ., T }, these assumptions imply : where probabilities p i,j = (p i,j (0), . . ., p i,j (κ i,j )) represent the distribution law for the progeny of type j from an individual of type i.More precisely, an individual of type i will give, after one unit of time, k individuals of type j (including potentially himself), with probability p i,j (k).We further assume that all sampling in (2.2) are independent.For simplicity, formula (2.1) turned into Of course setting we recover the model matrix form of equation (2.1) that is With this formalism, A( • ) is a random matrix changing at each time.However it is not the same as in [39,Chapter 3] because this sequence of matrices is not a sequence of independent and identically distributed matrices.To see how demography or environment play a role in this matrix notation and to see the importance of considering demographic stochasticity (even in larger populations), we can see [15] and references therein.

Prior of the Multiple Galton-Watson model
We will then suppose that (p i,j ) 1≤i,j≤K is distributed according to some probability distribution, usually called the prior.This prior will be the convolution of Dirichlet distribution [34, Section A.8 p. 521]; namely p i,j are independent from each other and we have: This is a classical law on discrete probability.Parameters (α i,j ) are called hyper parameters and correspond to the information to incorporate in the estimation.They are fixed by the users and can take into account previous studies or expert knowledge.Among all the choices, three main strategies can be chosen.
(1) The first one corresponds to the non-informative choice.If we have no information on the ecological system, we wish to decrease the effects of prior outcomes.We then take p i,j as drawn uniformly at random over all probability vectors.This corresponds to the choice α i,j (0) = • • • = α i,j (κ i,j ) = 1.Namely, we consider as prior, the uniform law on the simplex of probability measure on a discrete space.This choice is motivated by the minimization of the entropy of these distributions (See [34, Section 3.2.3]and [35,Section 6]).Roughly, it is the more random choice or the less informative one.
(2) Another choice can consist of putting information in the hyperparameters (α i,j ).Indeed, as Dirichlet distribution is uni-modal (at least for large parameters) then one can easily choose (α i,j ) so that E[p i,j (k)] is equal to a presupposed valued m k plus or minus a fixed error estimate σ (uniform in k).Values of m k and σ can for instance be taken from a previous study where data are not available but just the estimator with a confidence interval as in [12,13].
If we expect that p i,j = m i,j (k) ± ϵ i,j (k) and ϵ i,j (k) is centred and has variance of magnitude This choice enables to match the expectation to the variance.More precisely, to match the expectation, we need α i,j (k) with S i,j = κ i,j k=0 α i,j (k).We have then α i,j (k) = S i,j m i,j (k).To match the variance, we have S i,j m i,j (k)(S i,j − S i,j m i,j (k) S 2 i,j (S i,j + 1) .
Of course we need σ 2 i,j < 1.However, if this is not the case, the noise is too large and the information on p i,j is therefore less informative than no information.
(3) Finally, the expert choice can be driven by external data and the weight of the expertise could be automatic.Indeed, Parameters (α i,j ) can incorporate belief which does not come from any data.They can represent the balance between data and expertise.If the experts think that p i,j ≈ q i,j and that their opinion is as important as M data then one can choose α i,j = M × q i,j .

Posterior
We assume that our demographic data has the form of life tables (n i,j (k, t)) 1≤i,j≤K,k,t≥0 ; where n i,j (k, t) corresponds to the number of individuals at time t of type i which gives k individuals of type j at time t + 1.We assume to know all the population information over the time interval {0, . . ., T }; namely our data is n = (n i,j (k, t)) 1≤i,j≤K,k≥0,T ≥t≥0 .Due to the well known conjugation property between multinomial and Dirichlet distribution (see for instance [34,35]), and the form of the Galton-Watson transitions and the Markov property, the form of the posterior is simple (see [30,Section 3] and [19, p. 1284] for a detailed computation).More precisely, for every i, j, the posterior of the parameter p i,j conditioned on the data n are (2.5) This posterior sums up all the information we have coming from the data and our population model.To answer our main questions, it remains to summarize this object to the principal objects of interest (chance of extinction, extinction time, abundance, etc.).According to the objectives of the outcomes, many choices of modelling are over possible and we will describe three different ones: use of posterior estimators for p i,j (mean or mode), use of posterior simulations of p i,j to construct a scenario approach and use of the posterior to compute analytical integration of p i,j .

Random Mean Matrix as a generalisation of the Population Projection Matrix
An important quantity associated with Galton-Watson model is the random mean matrix M whose entries are It is a random matrix since p i,j are supposed to be random in the Bayesian framework.This matrix is the counterpart to the population projection matrix in matrix population model [11,39].
In the particular case where types correspond to ages or stages (and individuals can only get old or reproduce), then we recover that M is the classical Leslie matrix.
Under general assumptions [21, Section 2.3.1], the evolution of N (t) when t becomes large only depends on the largest eigenvalue λ of this matrix and the two associated eigenvectors u, v (with positive coordinates).That are those verifying and As explained in detail in [34,Section 4], the posterior mean, the posterior mode, the posterior median, etc. can be used to build an estimator of M .Note that some of these types of estimator can be easily computed; for instance The main drawback of this approach is that it loses the error estimation which is naturally embedded in the posterior.However, using punctual estimation allows to compute λ and gives an estimation of the viability.Another way is to compute the Probability of viability from posterior simulation: Do n prec times the steps • Simulate the K 2 probability valued random variables p i,j .
• Take for λ the largest eigenvalue of the M defined by (3.1).
Then count the proportion of λ being larger than 1.

Probability of Extinction
On the event λ > 1, extinction occurs with positive probability (but not equal to 1).In fact, if Ext denotes the extinction event then we have where is the generating function associated with p; for i ∈ {1, . . ., K}, it is defined by In particular, when we start with only one individual with type i (that is N i (0) = 1 and N j (0) = 0 for j ̸ = i) then The variable s i is the probability of extinction starting from one individual with type i.
The posterior distribution of (p i,j ) can be used to simulate the distribution of the probability of extinction or to compute punctual estimation of this probability of extinction at different quantiles of the posterior.
In practice, to compute the probability of extinction from posterior simulation: Do n prec times the steps • Simulate the K 2 probability valued random variables p i,j .
• Find the (vector) solution s of equation φ(s) = s, with φ is the generating function defined in (3.5).To do this step one can use any classical optimization algorithm.
• Calculate s Then do a mean of these quantities.Another option is to calculate P(Ext | (n)), directly integrating Equation (3.4) under the Dirichlet distribution given in Equation (2.5).This has the advantage of not being conditioned on one (p i,j ) because we take a mean value of all (p i,j ) combinations, using our posterior.This perfectly describes the prediction randomness coming from the demographic stochasticity and the statistical estimation.

Time to Extinction
On the event λ < 1 then the process goes to extinction; that is there exists a finite random time T Ext such that N (t) = 0 for all t ≥ T Ext (and N (T Ext − 1) > 0).Even if one can estimate T Ext by simulation, sharp analytic bounds have been proved.For instance, with probability 1 − α, for some fixed threshold α, do n prec times the steps • Simulate the K 2 probability valued random variables p i,j .
• Calculate the function where v is given in Equation (3.2).
Then do a mean value of functions U and L and choose (T − , T + ) such that These two functions are based on Equation (3.6) and Lemma 6.1.Note that these bounds on extinction time only hold when λ < 1.To avoid some computational problems due to the fact that P(λ < 1) > 0 (even if it is very small), it can be useful, when calculating the extinction time conditioned on the event {λ < 1}.In the previous algorithm, this conditioning translates into doing a mean value only for λ satisfying λ < 1.

Further results
How to plan reintroduction: Do n prec times the steps • Simulate the K 2 probability valued random variables p i,j .
• Find the (vector) solution s of equation φ(s) = s, with φ is the generating function defined in (3.5).To do this step one can use any classical optimization algorithm.
Then do an histogram or smoothed density through classical kernel density methods for instance.Additionally, Process (N (t)) is a Markov process.That means that for a current time T , the future evolution of the population only depends on the distribution of the population at time T and not on periods of time in the past (conditionally on (p i,j ) of course).As a consequence, Formula (3.4) and Formula (3.6) hold when replacing N (0) by N (T ) and conditioning on N (T ) > 0. If we know that some individuals remain at time T , then we have to take into account this information.
Finally, in addition to the spectral analysis, all results of a matrix population model (sensitivity analysis,. . . ) can be applied using the matrix M for the study of the Galton-Watson process.Also, for short time, the Galton-Watson process evolves in the same way as its associated deterministic model up to error around N (t).Consequently for short time projection, matrix population model and Galton-Watson should give similar results.

Context
The Pyrenean brown bear (Ursus arctos) population is considered as one of the most seriously threatened with extinction in Western Europe.In the 90's and early 2000s, the reinforcement of the population by the introduction of a few individuals from Slovenia allowed to avoid population extinction.Yet, the population remains fragile, and the introduction of new individuals gives rise to tumultuous debates in France.A previous study [13] established that "successful conservation strategies for this population would require releasing more bears in both sub-populations in the near future".The authors used a deterministic matrix model for estimation and stochastic simulation for prediction.Let us give here a short study on the powerful properties of our Bayesian approach which used a stochastic matrix model and posterior distribution for both estimation, prediction and simulation.The data we used comes from [9].It corresponds to the exhaustive supervision of all the population between 2005 and 2016 of the French Pyrenean brown bears.This population is split Table 4.1.Posterior for all parameters p i,j (k) and the vector valued p 5,1 for the bears population example.We use the data [9] and the non-informative prior.

Parameter
Posterior Mean IC 90% p 1,2 (1) Beta (18,4)  into two different isolated subpopulations.In the western part of the Pyrenees, before 2018, there were only two males (of which only one is indigenous), therefore we will focus on the subpopulation living in the central part of the Pyrenees.

Modelling assumptions
We consider the same structure model as [13].Namely, we only consider the evolution of the number of females and we consider a population structured by age, with K = 5 classes.For classes i ∈ {1, . . ., 4}, this corresponds to bears whose age is i−1 years.For i = 5 this corresponds to the bears whose age is greater than or equal to 4 years.The life of a bear is modelled as follows: during its first 4 age stages, it can either die or pass to the next age.When it is in the last fully developed stage, it can die, survive and reproduce.See [13] for biological motivations.We assume that surviving and reproducing are two independent events even if generally it is assumed that reproduction is only possible for surviving parent.Both choices are modelling preferences that marginally add a bias to the prediction.It can have an importance mostly for the interpretation of the estimated parameters but we are not doing this here.Indeed, as we estimate our parameters with this assumption, it decreases the values of reproducing parameters (because we will not see offspring when parents die) but it will increase the offspring predictions (because it is possible to have offspring when parents die).These two facts are balanced because they are assumed for the estimation and for the prediction.In the same way, we assume that females can reproduce each year although it is an exceptional event.This changes nothing because it will naturally divide by 2 the reproduction rate.

Results
Using Formula (2.5), the posterior law is easily calculable from the data of [9] and is then given in Table 4.1.The Beta distribution can be seen as a particular case of Dirichlet distribution.
We can now see the evolution of the population over short time spans.To illustrate this, let us represent for each year t, the prediction we give from the knowledge of the previous years 2005, . . ., t − 1 (that is we do not use all the data set to estimate (p i,j )).We represent these successive predictions in Figure 4.1.Each boxplot represents the distribution of the prediction.Boxplots and diamonds both represent prediction of the number pf females at each year considering data available on the previous years and a non-informative prior.The boxplot represents the law of the prediction and diamond represents the mean estimator given by Formula (3.3).
The mean estimator given by Formula (3.3) was added (diamond symbol).To compare with the genuine evolution of the population, we add the total number of female bears at each year (red circles).In this figure, we can for instance see the learning step of our algorithm.
Let us now focus on λ; that is the viability problem.For this model the principal eigenvalue λ is then the solution of the equation 1)p 2,3 (1)p 3,4 (1)p 4,5 (1) = 0, which has nevertheless no explicit solution (this equation comes from the fact that λ is just the root of the characteristic polynomial).However, numerically, using posterior estimation of (p i,j ) one can find P( λ ≤ 1) = 0.012.Hence, with high probability the process is super-critical and the reproduction capacity seems to guarantee viability.However if the population is too small, the survival is not guaranteed.Again, surviving probabilities s in Equation (3.4) have no explicit solutions but are functions of p i,j and one can use simulated values from the posterior to find that, with the current population size, the probability of extinction can be rounded to 0. It seems that the size of the population and the reproduction capacity is large enough to guaranty survival without reintroduction in the central part of the Pyrenees.Let us emphasize again that this result is only valid under our main assumptions.That includes no change in the environmental conditions including hunting or other human interference.Note that we also assumed no change in the basic demographic parameters that could be due to inbreeding depression.In fact, the current population is highly inbreed [9].However one can still see in Figure 4.2 that it is more beneficial to reintroduce older bears than younger ones.
In particular, in 2018, two females were reintroduced in the western part of the Pyrenees.Assuming similar evolution, there is around 15% risk of extinction for this sub-population (of this two males and 2 reintroduced females).Additionally, if these females are pregnant and then if they survive and give birth each to one female cub then this risk will be divided by 2 (around 8%).Closely related to theses questions, using Formula (3.4), we can calculate the effective population size; namely the number of reproducing individuals we need to have an extinction risk lower than a defined threshold.Here, to have an extinction probability lower than 0.05 then we need at least 5 reproducing females.

Sensitivity to the Prior
We only have 11 years of data and one would think that this is not sufficient to estimate all the parameters we need.Let us show how we can integrate the statistical estimation of [13].The data they used come from different articles, where the full description of the estimation procedure is not always available.If we look at the infantile mortality rate p 1,2 (0) then it was based on [41].Their estimation is 0.4 for a population of 150 bear cubs.The resulting estimated variance is then 1.96 0.4 × 0.6/150 = 0.0784.To take into account this study, we can then choose = 0.7855.
Using these hyper parameters and the data, we find that the posterior law is a Beta(18.2417,3.7855).This is very close to the non-informative estimation Beta (18,4) given in Table 4.1.Hence, even if the sample size data seems limited,there is enough information to limit the sensitivity of the prior.We then restrict ourselves to the non-informative prior because it gives similar results and several incomes are unknown as for instance for choosing (α 1,5 ).

Missing Data or Information
In general, we do not know the sex of the bear cub before until a few years after their birth.Consequently, we do not know n 5,1 (•, t) for the last year of observation, and these data were not used in the previous estimation.However, one can slightly change the Bayesian model to take into account this missing information.Indeed, one can add some structure and assume that where q is the law of the number of spring for a female bear (male and female bear cubs) and p is the probability that a cub is a female.Using this (natural) representation then we keep the conjugation property, and all the posterior remains explicit.For the dataset [9], this leads, for instance, to 0.583 as posterior mean estimator for the sex ratio.This is slightly more favourable than the estimation of 0.5 used in [13].Using the number of males in the population and supposing that the survival rates are similar can reduce the variance of our estimation; this is a compromise between the data and expertise.

Discussion
We have shown how the Galton-Watson process with Bayesian inference is a simple but powerful setting for PVA analysis.On top of giving the same information as the matrix population model, it easily gives precise results for probability of extinction, time to extinction, etc.It is perfectly adapted for small populations and then perfectly completes previous studies, see [8,15,18,23,32].Deterministic models including differential equation and matrix products are well suited to large population (but sometimes more than 10 6 individuals [10]).Stochastic differential equations or hybrid models (including piecewise deterministic models) are more appropriate at a mesoscopic scale; at least for a thousand individuals.Few models are concerned with smaller population size although these populations are more concerned by extinction.For small data sets, our approach takes into account demographic stochasticity and enables to add expert knowledge in the prior (via (α i,j (k)) setting) and improves posterior estimation and prediction.
The Bayesian setting we present can also be adapted for missing data or less informative data.In the use case, in addition to the viability study, we can integrate the sex ratio of the bear cubs as a latent variable in the model and estimate its value as 0.583.The combination between Galton-Watson processes and Bayesian, and known mathematical results [1,21,22,29,30] implies simple formulas and there is no need for large simulations to answer simple questions.Results are found without using long computational algorithms with unknown errors.Consequently, it is made for small data but easily scalable to big data : in contrast with ABC methods [16] or other MCMC methods [37], there is no need for a large number of simulations; this permits us to use this method for large datasets.One can easily use this model as part of a larger model to motivate control of the population or others strategies.Furthermore, there is no doubt about the convergence of this type of algorithm and the associated errors due to such computations.
For conservation purposes, there is more and more population monitoring and there is more and more exhaustive information to understand population evolution.Nevertheless, classical statistical methods do not use this exhaustive information to calibrate demographic model.The Bayesian approach we present offers the advantage of taking into account all the information of an exhaustive dataset to combine estimation, prediction and simulation in one unify approach, conditioned on the data (posterior), and using the expert knowledge (prior).All the results are calculated or simulated from the posterior distribution (a Dirichlet distribution).In the use case of French Pyrenean brown bears, the posterior distribution is given in Table 4.1 and the viability analysis based on this posterior indicates that the reproduction capacity seems to guarantee viability.Those results come from a Bayesian approach and are therefore conditioned by the data and the model assumptions.
This modelling is founded on several assumptions and in general, some of them can be questionable such as the absence of carrying capacity, the independence of individuals and the static environment but they are very natural considering a small population during short time period.This model takes into account statistical errors and the demographic stochasticity.However, the environmental stochasticity is not directly included in this estimation even if the Bayesian learning picks up some part of it with the data.Genetic aspects that can be of importance for small populations is also not taken into account.Considering the example of the bears, we do not speak about the male reintroduction which can have a real interest to avoid inbreeding depression.We focus on demographic aspects and numbers.Including the inbreeding issue in our setting may be feasible and could provide interesting insights [14,22].
An other assumption that we have made for bears example, which can be restrictive, is that surviving and reproducing are independent, but for prediction in contrast with parameter interpretation, this is not a real problem.Nevertheless, one can extend our setting to model the necessity of survival to make reproduction possible.The assumption that all p i,j are independent from each other can be removed.From [30], we can extend Formula (2.5) for the following setting: for every k 1 , . . ., k K , we replace all p i,j (k) by p i (k 1 , . . ., k K ) which is the probability that an individual with type i has k 1 offspring of type 1 and k 2 offspring of type 2 etc.It remains to assume that all p i are independent and keep a Dirichlet law (on K j=1 {1, .., κ i,j } instead of {1, . . ., κ i,j }) as prior to have the same type of approach.
Finally, in many examples coming from ecology, the number of offspring may be very large and it is not possible to fix any value κ i,j for the maximum numbers of offspring.It is therefore not possible to directly use our approach unless we take a large κ i,j (this requires too much information).In order to keep the same idea of our approach, we can suppose that offspring are given by a Poisson law P(m) (which arises naturally as κ i,j → ∞ as a limit of our model under general assumptions [2]).Also the naturally associated prior will be the Gamma law.It also verifies properties (conjugation, non-informative, expertise, etc.) which ensure an explicit posterior distribution.
6. Appendix 6.1.A lower bound for the extinction time Lemma 6.1.We have Proof.The proof is inspired from [21, Box 5.2 p. 119] We have , where X(t) = K j=1 v j N j (t).Yet, we have E[X(t)] = λ t K j=1 v j N j (0), and it remains to study the second moment.Recall that Setting N(t) = j N j (t) for the population size and reindexing the population, we find N j (t + 1) = N(t) l=1 ζ i(l),j,l,t , where i(l) is the type of the individual l and ζ its offspring.We have In the third line we conditioned on N(t) and use the independance property.In the fourth line, we use the notation M (2) i,j,j ′ = E (ζ i,j,l,t ) ζ i,j ′ ,l ′ ,t .If j ̸ = j ′ then, under our assumptions, we have i,j,j ′ = M i,j M i,j ′ , and either
Finally, we obtain

Figure 4 . 1 .
Figure 4.1.Red circles represent the total number of female bears at each year.Boxplots and diamonds both represent prediction of the number pf females at each year considering data available on the previous years and a non-informative prior.The boxplot represents the law of the prediction and diamond represents the mean estimator given by Formula (3.3).