Skip to main content

Getting started with TreePPL

Installation

References and citation

Please cite the TreePPL Release Notes.

BibTeX reference:

@article {Senderov2023.10.10.561673,
author = {Senderov, Viktor and Kudlicka, Jan and Lund{\'e}n, Daniel and Palmkvist, Viktor and Braga, Mariana P. and Granqvist, Emma and {\c C}aylak, Gizem and Virgoulay, Thimoth{\'e}e and Broman, David and Ronquist, Fredrik},
title = {TreePPL: A Universal Probabilistic Programming Language for Phylogenetics},
elocation-id = {2023.10.10.561673},
year = {2024},
doi = {10.1101/2023.10.10.561673},
publisher = {Cold Spring Harbor Laboratory},
abstract = {We present TreePPL, a universal probabilistic programming language (PPL) designed for probabilistic modeling and inference in phylogenetics. In TreePPL, the model is expressed as a computer program, which can generate simulations from the model conditioned on some input data. Specialized inference machinery then uses this program to estimate the posterior probability distribution. The aim is to allow the user to focus on describing the model, and provide the inference machinery for free. The TreePPL modeling language is meant to be familiar to users of R or Python, and utilizes a functional programming style that facilitates the application of generic inference algorithms. The model program can be conveniently compiled and run from a Python or R environment, which can be used for pre-processing, feeding the model with the observed data, controlling and running the inference, and receiving and post-processing the output data. The inference machinery is generated by a compiler framework developed specifically for supporting domain-specific modeling and inference, the Miking CorePPL framework. It currently supports a range of inference strategies{\textemdash}including sequential Monte Carlo, Markov chain Monte Carlo, and combinations thereof{\textemdash}and is based on several recent innovations that are important for efficient PPL inference on phylogenetic models. It also allows advanced users to implement novel inference strategies for models described using TreePPL or other domain-specific modeling languages. We briefly describe the TreePPL modeling language and the Python environment, and give some examples of modeling and inference with TreePPL. The examples illustrate how TreePPL can be used to address a range of common problem types considered in statistical phylogenetics, from diversification and tree inference to complex trait evolution. A few major challenges remain to be addressed before the phylogenetic model space is adequately covered by efficient automatic inference techniques, but several of them are being addressed in ongoing work on TreePPL. We end the paper by discussing how probabilistic programming can facilitate further use of machine learning in addressing important challenges in statistical phylogenetics.Competing Interest StatementThe authors have declared no competing interest.},
URL = {https://www.biorxiv.org/content/early/2024/11/13/2023.10.10.561673},
eprint = {https://www.biorxiv.org/content/early/2024/11/13/2023.10.10.561673.full.pdf},
journal = {bioRxiv}
}

Zotero or Mendeley reference:

TY  - JOUR
T1 - TreePPL: A Universal Probabilistic Programming Language for Phylogenetics
JF - bioRxiv
DO - 10.1101/2023.10.10.561673
SP - 2023.10.10.561673
AU - Senderov, Viktor
AU - Kudlicka, Jan
AU - Lundén, Daniel
AU - Palmkvist, Viktor
AU - Braga, Mariana P.
AU - Granqvist, Emma
AU - Çaylak, Gizem
AU - Virgoulay, Thimothée
AU - Broman, David
AU - Ronquist, Fredrik
Y1 - 2024/01/01
UR - http://biorxiv.org/content/early/2024/11/13/2023.10.10.561673.abstract
N2 - We present TreePPL, a universal probabilistic programming language (PPL) designed for probabilistic modeling and inference in phylogenetics. In TreePPL, the model is expressed as a computer program, which can generate simulations from the model conditioned on some input data. Specialized inference machinery then uses this program to estimate the posterior probability distribution. The aim is to allow the user to focus on describing the model, and provide the inference machinery for free. The TreePPL modeling language is meant to be familiar to users of R or Python, and utilizes a functional programming style that facilitates the application of generic inference algorithms. The model program can be conveniently compiled and run from a Python or R environment, which can be used for pre-processing, feeding the model with the observed data, controlling and running the inference, and receiving and post-processing the output data. The inference machinery is generated by a compiler framework developed specifically for supporting domain-specific modeling and inference, the Miking CorePPL framework. It currently supports a range of inference strategies—including sequential Monte Carlo, Markov chain Monte Carlo, and combinations thereof—and is based on several recent innovations that are important for efficient PPL inference on phylogenetic models. It also allows advanced users to implement novel inference strategies for models described using TreePPL or other domain-specific modeling languages. We briefly describe the TreePPL modeling language and the Python environment, and give some examples of modeling and inference with TreePPL. The examples illustrate how TreePPL can be used to address a range of common problem types considered in statistical phylogenetics, from diversification and tree inference to complex trait evolution. A few major challenges remain to be addressed before the phylogenetic model space is adequately covered by efficient automatic inference techniques, but several of them are being addressed in ongoing work on TreePPL. We end the paper by discussing how probabilistic programming can facilitate further use of machine learning in addressing important challenges in statistical phylogenetics.Competing Interest StatementThe authors have declared no competing interest.
ER -

Another relevant paper is the PPL's for phylogenetics concept paper. TreePPL will be built on top of Miking, a language framework for constructing efficient compilers for domain-specific languages.

What is TreePPL?

TreePPL is a universal1 probabilistic programming language (PPL) for evolutionary biology and phylogenetics.

The ultimate vision of probabilistic programming is to provide expressive model description languages, while at the same time supporting the automated generation of efficient inference algorithms. This allows empiricists to easily and succinctly describe any model they might be interested in, relying on the automated machinery to provide efficient inference algorithms for that model.

Current probabilistic programming languages (PPLs) are often difficult to use for empiricists. Furthermore, even though there is now (as of 2025) swift progress in PPL inference strategies, there is still a substantial gap in many domains before PPL systems can compete successfully with dedicated software, or even provide computationally feasible solutions.

The design principles of TreePPL are as follows:

  1. TreePPL should be easy to use for empiricist. A source of inspiration in this context is WebPPL, which we think is one of the most accessible PPLs in terms of syntax. Beyond an intuitive syntax, TreePPL also needs to have extensive support for model components that are commonly used in phylogenetics.

  2. TreePPL should provide state-of-the-art efficiency in the inference algorithms it generates from phylogenetic model descriptions. TreePPL should support advanced users that want to experiment with inference algorithms or develop entirely new inference strategies for phylogenetic models.

  3. TreePPL should provide a number of pre-implemented models that users can use as starting points.

  4. Phylogenetic data should be easy to handle in TreePPL.

We aim TreePPL primarily at computational biologists and bioinformaticians, however due to its universality, empiricists from all domains are welcome to experiment with the language and join the effort.

Footnotes

  1. A universal PPL is a PPL in which the number of r.v.'s does not have to be known at compilation time, i.e. random choices during runtime can lead to new r.v.'s being sampled.