Fixed-Size Determinantal Point Processes Sampling For Species Phylogeny
MathematicS In Action, Volume 10 (2021) no. 1, pp. 1-13.

Determinantal point processes (DPPs) are popular tools that supply useful information for repulsiveness. They provide coherent probabilistic models when negative correlations arise and also represent new algorithms for inference problems like sampling, marginalization and conditioning. Recently, DPPs have played an increasingly important role in machine learning and statistics, since they are used for diverse subset selection problems. In this paper we use $k$-DPP, a conditional DPP that models only sets of cardinality $k$, to sample a diverse subset of species from a large phylogenetic tree. The tree sampling task is important in many studies in modern bioinformatics. The results show a fast mixing sampler for $k$-DPP, for which a polynomial bound on the mixing time is given. This approach is applied to a real-world dataset of species, and we observe that leaves joined by a higher subtree are more likely to appear.

Published online:
DOI: 10.5802/msia.13
Keywords: Determinantal point process, Kernel, Markov chain, Metropolis-Hasting, Mixing time, Phylogenetic tree
Diala Wehbe 1; Nicolas Wicker 2; Baydaa Al-Ayoubi 3; Luc Moulinier 4

1 Paul Painlevé Laboratory, University of Lille, 59650 Villeneuve D’Ascq, France and EDST, Lebanese University, Tripoli, Lebanon
2 Paul Painlevé Laboratory, University of Lille, 59650 Villeneuve D’Ascq, France
3 Faculty of Sciences, Lebanese University, Rafic Hariri University Campus - Hadas, Lebanon
4 ICube, CSTB (Complex Systems and Translational Bioinformatics), University of Strasbourg, 67085 Strasbourg, France
