R interface
treepplr is an interface for using the TreePPL program. All functions start
with tp_ to easily distinguish them from functions from other packages.
The three necessary parts for doing analysis with TreePPL are: model, data and inference machinery.
Model
You can choose a model in TreePPL language from our library or you can write your own model.
To list all available models in our library, use tp_model_library() to retrieve
the models within the TreePPL github repository.
model_lib <- tp_model_library()
To use one of these models you just need its name in model_lib$model_name.
If you want to use your own custom model, you will need to write it in TreePPL
language and pass it to an R object that contains either the full path to the
.tppl file containing the model, or a string with the full model.
# import a model from file
model_path <- "path/to/my_model.tppl"
Data
TreePPL only reads a custom JSON format, so treepplr converts a variety of
input data to this format and writes to file, which will then be used by TreePPL.
Here are some examples:
# for models that only need a phylogenetic tree
phylo <- ape::read.tree(file = "path/to/your/file.tre")
data_path <- tp_data(data_input = phylo)
# or sequence data
fasta_file <- "path/to/your/file.fasta"
data_path <- tp_data(data_input = fasta_file)
As for models, you can also use test datasets from the TreePPL library by passing the name of the model (we'll come back to this later).
Inference method
TreePPL offers a variety of inference methods. Different methods work best for different models. See the model library for our recommendations of which inference methods to choose for each model.
Compilation
Once you have chosen the model and the inference method you want to use, you can compile your model to en executable that also contains the necessary machinery to run the chosen inference method.
# Using a model from the library and a Sequential Monte Carlo method
exe_path <- tp_compile(model = "crbd", method = "smc-apf", particles = 10000)
# Using a custom model and a Markov chain Monte Carlo method
exe_path <- tp_compile(model = model_path, method = "mcmc-lightweight",
iterations = 10000)
Running
Now you are ready to run your analysis. All you have to do is to pass your data to the compiled executable and choose how many independent runs you want to do.
output <- tp_run(compiled_model = exe_path, data = data_path, n_runs = 4)
Convergence
Then you can parse your output to produce a data frame and check for convergence.
# If using SMC
output_df_smc <- tp_parse_smc(output)
tp_smc_convergence(output_df_smc)
# If using MCMC
output_df_mcmc <- tp_parse_mcmc(output)
Post-processing
Different models produce different outputs and thus require different post-processing. See the model-specific tutorials for ways to process your TreePPL output.