Biogeographical Stochastic Mapping Example Script

Introduction

(link to this section)

I have now added Biogeographical Stochastic Mapping (BSM) to my basic collection of BioGeoBEARS code, now that it has been validated (see BioGeoBEARS validation).

When I get a chance to do a CRAN update (a major project, as I have to thoroughly revise the documentation etc.), it will be available there as well. For now, you should source() the full collection of update R files in the example script (which you should do anyway, after each time you invoke library(BioGeoBEARS)).

See my introduction and example animation graphics for Biogeographical Stochastic Mapping here: Stochastic mapping under biogeographical models (updated April 2015).

Citation

(link to this section)

For now, if you use this for research, please cite:

Matzke, Nicholas J. (2016). "Stochastic mapping under biogeographical models." PhyloWiki BioGeoBEARS website, 2016,
http://phylo.wikidot.com/biogeobears#stochastic_mapping .

Clarifications about Biogeographic Stochastic Mapping (BSM)

(link to this section)

Brief notes (hopefully written up more later):

  • Because of the programs SIMMAP and phytools, some people think of stochastic mapping as an alternate model to ancestral range inference, or they think that Biogeographic Stochastic Mapping is the same thing as what SIMMAP and phytools are doing. In reality:
    • I think it is useful to distinguish models, methods, and programs. The model consists of the transition matrices, rates, etc. SIMMAP and phytools are (typically, at least) working with unordered characters (they may do other things also). If used for geography, this assumes (a) nothing can ever live in more than one area, (b) the only kind of dispersal is "range-switching" (e.g., sudden jump from A->B along a branch), (c) nothing happens at cladogenesis (or, rather, the range is copied exactly to both descendants: A->A,A).
  • This (or any other) model can then be used to (a) calculate the log-likelihood of the data (for ML or Bayesian parameter inference); (b) calculate the probability of ancestral states under the model (ancestral area reconstruction, or, better, ancestral range estimation); (c) stochastic mapping (for event counts, residence times, etc.) These are different methods that make use of one or more models.
  • The handy thing about Biogeographic Stochastic Mapping (BSM) in BioGeoBEARS is that it is programmed to do the stochastic mapping method (and ML ancestral states inference methods) on any of the programmed biogeographical models.
  • All of the programs can do the simpler models, but only BioGeoBEARS can do the complex biogeographic models. If you do want to run a simple standard character in BioGeoBEARS, this is totally possible. I call it the "BAYAREALIKE+a-d-e" model, with max_range_size=1 (I use this to do validation of BioGeoBEARS here: BioGeoBEARS Validation). So, BioGeoBEARS, SIMMAP, and phytools are three programs that can all run the stochastic mapping method on the standard unordered character model.

Biggest caveats

(link to this section)

The biggest caveats with Biogeographic Stochastic Mapping are those common to all available biogeographic models, namely:

  1. Even the best model might be wrong. In fact, it almost certainly is. "All models are wrong, but some models are useful." The event counts represent estimates of the number of events conditional on the model, the estimated model parameters, the phylogeny, and the data.
  2. The biggest worrisome assumption of all of these biogeographic models is that they treat the observed tree as the true tree, and thus they leave out extinction (and unsampled species, if those exist). This is a problem with almost all phylogenetic models (SSE models attempt to model extinction, but one can argue if they do it well), but might be worse for biogeographic models, since cladogenetic range-change processes are being postulated, so missing/extinct species mean missing cladogenetic events in the tree. The main thing missing-extinction-events means for event counts is that event counts are probably underestimates. Mostly what we are hoping is that we are at least getting the minimum number of events, and that these counts will correlate with the true events.
  3. If you have hugely biased extinction or sampling, then of course this will drastically effect results. This is common to all phylogenetic methods (a phylogeny of only large mammals would estimate that the common ancestor of mammals was large).

Note: Only *single*-letter codes for areas!

(link to this section)

Note: The stochastic mapping code assumes that all of the areas have single-letter codes — without that assumption, there is no efficient way to store/list ranges of multiple areas. So: *always* use single-letter codes for areas, if you ever plan to do stochastic mapping.

Example Biogeographical Stochastic Mapping (BSM) graphics

(link to this section)

Here are some graphics produced by a script like the example BSM script below.

Most probable states/ranges (time-stratified DEC and DEC+J)

(link to this section)

Psychotria_DEC_M3b_strat_v1_states.png
Psychotria_DEC+J_M3b_strat_v1_states.png
Figure Caption: Plot of the single-most probable ancestral state/range for a time-stratified DEC model on Hawaiian Psychotria. Note that the single-most probable range at a particular node could still have very low probability; pie charts (below) should be consulted to get a sense of this. Figure Caption: Plot of the single-most probable ancestral state/range for a time-stratified DEC+J model on Hawaiian Psychotria. Note that the single-most probable range at a particular node could still have very low probability; pie charts (below) should be consulted to get a sense of this.

Pie charts of states/ranges probabilities (time-stratified DEC and DEC+J)

(link to this section)

Psychotria_DEC_M3b_strat_v1_pies.png
Psychotria_DEC+J_M3b_strat_v1_pies.png
Figure Caption: Pie charts showing ancestral state/range probabilities for a time-stratified DEC model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time. Figure Caption: Pie charts showing ancestral state/range probabilities for a time-stratified DEC+J model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time.

Single stochastic map plot (time-stratified DEC and DEC+J)

(link to this section)

DEC_M3b_strat_single_stochastic_map_n1.png
DEC+J_M3b_strat_single_stochastic_map_n1.png
Figure Caption: Plot of a single Biogeographical Stochastic Map (BSM) for a time-stratified DEC model on Hawaiian Psychotria. Note that a single BSM is just a single realization of a possible history. A sample of BSMs should be examined to determine the variability in possible histories. Figure Caption: Plot of a single Biogeographical Stochastic Map (BSM) for a time-stratified DEC+J model on Hawaiian Psychotria. Note that a single BSM is just a single realization of a possible history. A sample of BSMs should be examined to determine the variability in possible histories.

Animation of 50 BSMs (time-stratified DEC and DEC+J)

(link to this section)

DEC_M3b_strat_50BSMs_v1_100x100.gif
DEC+J_M3b_strat_50BSMs_v1_100x100.gif
Figure Caption: Animation of 50 Biogeographical Stochastic Maps (BSMs) for a time-stratified DEC model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time. Figure Caption: Animation of 50 Biogeographical Stochastic Maps (BSMs) for a time-stratified DEC+J model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time.

Histograms of event counts across 50 BSMs (time-stratified DEC and DEC+J)

(link to this section)

Event counts under ML time-stratified DEC model Event counts under ML time-stratified DEC+J model
DEC_M3b_strat_histograms_of_event_counts.png
DEC+J_M3b_strat_histograms_of_event_counts.png
Figure Caption: Histograms of the counts of different kinds of events found in each of the 50 BSMs. The x-axis gives the number of events, the y-axis gives the number of BSMs in which a specific number of events was observed. Model is a time-stratified DEC model on Hawaiian Psychotria. Figure Caption: Histograms of the counts of different kinds of events found in each of the 50 BSMs. The x-axis gives the number of events, the y-axis gives the number of BSMs in which a specific number of events was observed. Model is a time-stratified DEC+J model on Hawaiian Psychotria.

Validation of BSMs against ML ancestral state probabilities (time-stratified DEC and DEC+J)

(link to this section)

DEC_M3b_strat_ML_vs_BSM.png
DEC+J_M3b_strat_ML_vs_BSM.png
Figure Caption: Ancestral state/range probabilities, versus the state probabilities estimated by taking the mean of 50 BSMs, for a time-stratified DEC model on Hawaiian Psychotria. Figure Caption: Ancestral state/range probabilities, versus the state probabilities estimated by taking the mean of 50 BSMs, for a time-stratified DEC+J model on Hawaiian Psychotria.

Example script for Biogeographical Stochastic Mapping (BSM)

(link to this section)

This script should "run out of the box", i.e. if you copy/paste into your R window. (On some machines, huge copy-pastes introduce errors, in that case, you have to copy-paste in moderate-sized chunks.) You could also save the code to a .R file and then run from the Terminal (from outside of R) with Rscript scriptfilename.R.

Notes:

  • This is a time-stratified DEC+J analysis. To run other models, or a non-time-stratified model you will have to edit the model setup; see the setups of 6 basic models in the main BioGeoBEARS example script.
  • As usual, you will have to change the script to specify your own working directory.
  • The input files (in this case, for a time-stratified analysis of Psychotria, with Hawaiian islands disappearing back in time) should auto-load from the BioGeoBEARS extdata directories specified. Change these directory locations/file names to use your own files.
  • Also, if you switch to a non-stratified analysis, be sure to change stratified=FALSE to stratified=TRUE.

Example BSM script below:

# Load the package (after installation, see above).
library(optimx)         # You need to have some version of optimx available
                        # as it is a BioGeoBEARS dependency; however, if you
                        # don't want to use optimx, and use optim() (from R core) 
                        # you can set:
                        # BioGeoBEARS_run_object$use_optimx = FALSE
                        # ...everything should work either way -- NJM 2014-01-08
library(FD)       # for FD::maxent() (make sure this is up-to-date)
library(snow)     # (if you want to use multicore functionality; some systems/R versions prefer library(parallel), try either)
library(parallel)
library(BioGeoBEARS)

########################################################
# TO GET THE OPTIMX/OPTIM FIX, AND THE UPPASS FIX, 
# SOURCE THE REVISED FUNCTIONS WITH THESE COMMANDS
#
# CRUCIAL CRUCIAL CRUCIAL: 
# YOU HAVE TO RUN THE SOURCE COMMANDS AFTER 
# *EVERY TIME* YOU DO library(BioGeoBEARS). THE CHANGES ARE NOT "PERMANENT", 
# THEY HAVE TO BE MADE EACH TIME.  IF YOU ARE GOING TO BE OFFLINE, 
# YOU CAN DOWNLOAD EACH .R FILE TO YOUR HARD DRIVE AND REFER THE source()
# COMMANDS TO THE FULL PATH AND FILENAME OF EACH FILE ON YOUR
# LOCAL SYSTEM INSTEAD.
########################################################
library(BioGeoBEARS)
source("http://phylo.wdfiles.com/local--files/biogeobears/cladoRcpp.R") # (needed now that traits model added; source FIRST!)
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_add_fossils_randomly_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_basics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_calc_transition_matrices_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_classes_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_detection_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_DNA_cladogenesis_sim_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_extract_Qmat_COOmat_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_generics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_models_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_on_multiple_trees_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_plots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_readwrite_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_simulate_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_SSEsim_makePlots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_SSEsim_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stochastic_mapping_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stratified_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_univ_model_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/calc_uppass_probs_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/calc_loglike_sp_v01.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/get_stratified_subbranch_top_downpass_likelihoods_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/runBSM_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/stochastic_map_given_inputs.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/summarize_BSM_tables_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_traits_v1.R") # added traits model
calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations
calc_independent_likelihoods_on_each_branch = compiler::cmpfun(calc_independent_likelihoods_on_each_branch_prebyte)
    # slight speedup hopefully

#######################################################
# Local source()-ing method -- uses BioGeoBEARS sourceall() function 
# on a directory of .R files, so you don't have to type them out.
# The directories here are on my machine, you would have to make a 
# directory, save the .R files there, and refer to them.
#
# NOTE: it's best to source the "cladoRcpp.R" update first, to avoid warnings like this:
##
## Note: possible error in 'rcpp_calc_anclikes_sp_COOweights_faster(Rcpp_leftprobs = tmpca_1, ': 
##         unused arguments (m = m, m_null_range = include_null_range, jts_matrix = jts_matrix) 
##
#
# TO USE: Delete or comment out the 'source("http://...")' commands above, and un-comment
#              the below...
########################################################################
# Un-comment (and fix directory paths) to use:
#library(BioGeoBEARS)
#source("/drives/Dropbox/_njm/__packages/cladoRcpp_setup/cladoRcpp.R")
#sourceall("/drives/Dropbox/_njm/__packages/BioGeoBEARS_setup/")
#calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations
#calc_independent_likelihoods_on_each_branch = compiler::cmpfun(calc_independent_likelihoods_on_each_branch_prebyte)
########################################################################

#######################################################
# SETUP: YOUR WORKING DIRECTORY
#######################################################
# You will need to set your working directory to match your local system

# Note these very handy functions!
# Command "setwd(x)" sets your working directory
# Command "getwd()" gets your working directory and tells you what it is.
# Command "list.files()" lists the files in your working directory
# To get help on any command, use "?".  E.g., "?list.files"

# Set your working directory for output files
# default here is your home directory ("~")
# Change this as you like
wd = "/drives/Dropbox/_njm/__packages/BioGeoBEARS_setup/inst/extdata/examples/Psychotria_M0/BGB_BSM_DEC+J_M3b_strat/"
setwd(wd)

# Double-check your working directory with getwd()
getwd()

#######################################################
# SETUP: Extension data directory
#######################################################
# When R packages contain extra files, they are stored in the "extdata" directory 
# inside the installed package.
#
# BioGeoBEARS contains various example files and scripts in its extdata directory.
# 
# Each computer operating system might install BioGeoBEARS in a different place, 
# depending on your OS and settings. 
# 
# However, you can find the extdata directory like this:
extdata_dir = np(system.file("extdata", package="BioGeoBEARS"))
extdata_dir
list.files(extdata_dir)

# Time-stratified files are here
extdata_dir2 = np(slashslash(paste(extdata_dir, "examples/Psychotria_M3strat/BGB/", sep="/")))

# "system.file" looks in the directory of a specified package (in this case BioGeoBEARS)
# The function "np" is just a shortcut for normalizePath(), which converts the 
# path to the format appropriate for your system (e.g., Mac/Linux use "/", but 
# Windows uses "\\", if memory serves).

# Even when using your own data files, you should KEEP these commands in your 
# script, since the plot_BioGeoBEARS_results function needs a script from the 
# extdata directory to calculate the positions of "corners" on the plot. This cannot
# be made into a straight up BioGeoBEARS function because it uses C routines 
# from the package APE which do not pass R CMD check for some reason.

#######################################################
# SETUP: YOUR TREE FILE AND GEOGRAPHY FILE
#######################################################
# Example files are given below. To run your own data,
# make the below lines point to your own files, e.g.
# trfn = "/mydata/frogs/frogBGB/tree.newick"
# geogfn = "/mydata/frogs/frogBGB/geog.data"

#######################################################
# Phylogeny file
# Notes: 
# 1. Must be binary/bifurcating: no polytomies
# 2. No negative branchlengths (e.g. BEAST MCC consensus trees sometimes have negative branchlengths)
# 3. Be careful of very short branches, as BioGeoBEARS will interpret ultrashort branches as direct ancestors
# 4. You can use non-ultrametric trees, but BioGeoBEARS will interpret any tips significantly below the 
#    top of the tree as fossils!  This is only a good idea if you actually do have fossils in your tree,
#    as in e.g. Wood, Matzke et al. (2013), Systematic Biology.
# 5. The default settings of BioGeoBEARS make sense for trees where the branchlengths are in units of 
#    millions of years, and the tree is 1-1000 units tall. If you have a tree with a total height of
#    e.g. 0.00001, you will need to adjust e.g. the max values of d and e, or (simpler) multiply all
#    your branchlengths to get them into reasonable units.
# 6. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
#######################################################
# This is the example Newick file for Hawaiian JuliaDupin
# (from Ree & Smith 2008)
# "trfn" = "tree file name"
trfn = np(paste(addslash(extdata_dir), "Psychotria_5.2.newick", sep=""))

# Look at the raw Newick file:
moref(trfn)

# Look at your phylogeny:
tr = read.tree(trfn)
tr
#plot(tr)
#title("Florestina phylogeny")
#axisPhylo() # plots timescale

#######################################################
# Geography file
# Notes:
# 1. This is a PHLYIP-formatted file. This means that in the 
#    first line, 
#    - the 1st number equals the number of rows (species)
#    - the 2nd number equals the number of columns (number of areas)
# 2. This is the same format used for C++ LAGRANGE geography files.
# 3. All names in the geography file must match names in the phylogeny file.
# 4. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
# 5. Operational taxonomic units (OTUs) should ideally be phylogenetic lineages, 
#    i.e. genetically isolated populations.  These may or may not be identical 
#    with species.  You would NOT want to just use specimens, as each specimen 
#    automatically can only live in 1 area, which will typically favor DEC+J 
#    models.  This is fine if the species/lineages really do live in single areas,
#    but you wouldn't want to assume this without thinking about it at least. 
#    In summary, you should collapse multiple specimens into species/lineages if 
#    data indicates they are the same genetic population.
######################################################

# This is the example geography file for Hawaiian JuliaDupin
# (from Ree & Smith 2008)
geogfn = np(paste(addslash(extdata_dir), "Psychotria_geog.data", sep=""))

# Look at the raw geography text file:
moref(geogfn)

# Look at your geographic range data:
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
tipranges

# Set the maximum number of areas any species may occupy; this cannot be larger 
# than the number of areas you set up, but it can be smaller.
max_range_size = 4

#######################################################
# Run DEC
#######################################################

BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# Set up a time-stratified analysis 
# (un-comment to use; see example files in extdata_dir, 
#  and BioGeoBEARS google group posts for further hints)
BioGeoBEARS_run_object$timesfn = np(slashslash(paste(extdata_dir2, "timeperiods.txt", sep="/")))
BioGeoBEARS_run_object$dispersal_multipliers_fn = np(slashslash(paste(extdata_dir2, "dispersal_multipliers.txt", sep="/")))
BioGeoBEARS_run_object$areas_allowed_fn = np(slashslash(paste(extdata_dir2, "areas_allowed.txt", sep="/")))
#BioGeoBEARS_run_object$areas_adjacency_fn = "areas_adjacency.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.

# Input the maximum range size
BioGeoBEARS_run_object$max_range_size = max_range_size

# Multicore processing if desired
BioGeoBEARS_run_object$num_cores_to_use=1
# (use more cores to speed it up; this requires
# library(parallel) and/or library(snow). The package "parallel" 
# is now default on Macs in R 3.0+, but apparently still 
# has to be typed on some Windows machines. Note: apparently 
# parallel works on Mac command-line R, but not R.app.
# BioGeoBEARS checks for this and resets to 1
# core with R.app)

# Sparse matrix exponentiation is an option for huge numbers of ranges/states (600+)
# I have experimented with sparse matrix exponentiation in EXPOKIT/rexpokit,
# but the results are imprecise and so I haven't explored it further.
# In a Bayesian analysis, it might work OK, but the ML point estimates are
# not identical.
# Also, I have not implemented all functions to work with force_sparse=TRUE.
# Volunteers are welcome to work on it!!
BioGeoBEARS_run_object$force_sparse=FALSE

# Give BioGeoBEARS the location of the geography text file
BioGeoBEARS_run_object$geogfn = geogfn

# Give BioGeoBEARS the location of the phylogeny Newick file
BioGeoBEARS_run_object$trfn = trfn

# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)

# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE

# Set up DEC+J model

# Add j as a free parameter
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.001

# Look at the BioGeoBEARS_run_object; it's just a list of settings etc.
BioGeoBEARS_run_object

# This contains the model object
BioGeoBEARS_run_object$BioGeoBEARS_model_object

# This table contains the parameters of the model 
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

# Run this to check inputs. Read the error messages if you get them!
check_BioGeoBEARS_run(BioGeoBEARS_run_object)

# For a slow analysis, run once, then set runslow=FALSE to just 
# load the saved result.
runslow = TRUE
resfn = "Psychotria_DEC+J_M3b_strat_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    } else {
    # Loads to "res"
    load(resfn)
    }

###########################################
# Pick your model name:
###########################################
model_name = "DEC+J_M3b_strat"
# res = resDEC

#######################################################
# Plot ancestral states - DEC
#######################################################
pdffn = paste0("Psychotria_", model_name, "_v1.pdf")
pdf(pdffn, width=6, height=6)

analysis_titletxt = paste0(model_name, " on Psychotria")

# Setup
results_object = res
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

dev.off()  # Turn off PDF
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr) # Plot it

#######################################################
# Stochastic mapping on DEC M3b stratified with islands coming up
#######################################################
clado_events_tables = NULL
ana_events_tables = NULL
lnum = 0

#######################################################
# Get the inputs for Biogeographical Stochastic Mapping
# Note: this can be slow for large state spaces and trees, since 
# the independent likelihoods for each branch are being pre-calculated
# E.g., for 10 areas, this requires calculation of a 1024x1024 matrix
# for each branch.  On a tree with ~800 tips and thus ~1600 branches, this was about 1.6 gigs
# for storage of "BSM_inputs_file.Rdata".
# Update: 2015-09-23 -- now, if you used multicore functionality for the ML analysis,
# the same settings will be used for get_inputs_for_stochastic_mapping().
#######################################################
BSM_inputs_fn = "BSM_inputs_file.Rdata"
runInputsSlow = TRUE
if (runInputsSlow)
    {
    stochastic_mapping_inputs_list = get_inputs_for_stochastic_mapping(res=res)
    save(stochastic_mapping_inputs_list, file=BSM_inputs_fn)
    } else {
    # Loads to "stochastic_mapping_inputs_list"
    load(BSM_inputs_fn)
    } # END if (runInputsSlow)

# Check inputs (doesn't work the same on unconstr)
names(stochastic_mapping_inputs_list)
stochastic_mapping_inputs_list$phy2
stochastic_mapping_inputs_list$COO_weights_columnar
stochastic_mapping_inputs_list$unconstr
set.seed(seed=as.numeric(Sys.time()))

runBSMslow = TRUE
if (runBSMslow == TRUE)
    {
    # Saves to: RES_clado_events_tables.Rdata
    # Saves to: RES_ana_events_tables.Rdata
    BSM_output = runBSM(res, stochastic_mapping_inputs_list=stochastic_mapping_inputs_list, maxnum_maps_to_try=100, nummaps_goal=50, maxtries_per_branch=40000, save_after_every_try=TRUE, savedir=getwd(), seedval=12345, wait_before_save=0.01)

    RES_clado_events_tables = BSM_output$RES_clado_events_tables
    RES_ana_events_tables = BSM_output$RES_ana_events_tables
    } else {
    # Load previously saved...

    # Loads to: RES_clado_events_tables
    load(file="RES_clado_events_tables.Rdata")
    # Loads to: RES_ana_events_tables
    load(file="RES_ana_events_tables.Rdata")
    BSM_output = NULL
    BSM_output$RES_clado_events_tables = RES_clado_events_tables
    BSM_output$RES_ana_events_tables = RES_ana_events_tables
    } # END if (runBSMslow == TRUE)

# Extract BSM output
clado_events_tables = BSM_output$RES_clado_events_tables
ana_events_tables = BSM_output$RES_ana_events_tables
head(clado_events_tables[[1]])
head(ana_events_tables[[1]])
length(clado_events_tables)
length(ana_events_tables)

#######################################################
# Plot one stochastic map, manual method
#######################################################
# (we have to convert the stochastic maps into event
#  maps for plotting)

######################
# Get the color scheme
######################
include_null_range = TRUE
areanames = names(tipranges@df)
areas = areanames
max_range_size = 4

# Note: If you did something to change the states_list from the default given the number of areas, you would
# have to manually make that change here as well! (e.g., areas_allowed matrix, or manual reduction of the states_list)
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=include_null_range)

colors_list_for_states = get_colors_for_states_list_0based(areanames=areanames, states_list_0based=states_list_0based, max_range_size=max_range_size, plot_null_range=TRUE)

############################################
# Setup for painting a single stochastic map
############################################
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
stratified=TRUE
clado_events_table = clado_events_tables[[1]]
ana_events_table = ana_events_tables[[1]]

# cols_to_get = names(clado_events_table[,-ncol(clado_events_table)])
# colnums = match(cols_to_get, names(ana_events_table))
# ana_events_table_cols_to_add = ana_events_table[,colnums]
# anagenetic_events_txt_below_node = rep("none", nrow(ana_events_table_cols_to_add))
# ana_events_table_cols_to_add = cbind(ana_events_table_cols_to_add, anagenetic_events_txt_below_node)
# rows_to_get_TF = ana_events_table_cols_to_add$node <= length(tr$tip.label)
# master_table_cladogenetic_events = rbind(ana_events_table_cols_to_add[rows_to_get_TF,], clado_events_table)

############################################
# Open a PDF
############################################
pdffn = paste0(model_name, "_single_stochastic_map_n1.pdf")
pdf(file=pdffn, width=6, height=6)

# Convert the BSM into a modified res object
master_table_cladogenetic_events = clado_events_tables[[1]]
resmod = stochastic_map_states_into_res(res=res, master_table_cladogenetic_events=master_table_cladogenetic_events, stratified=stratified)

plot_BioGeoBEARS_results(results_object=resmod, analysis_titletxt="Stochastic map", addl_params=list("j"), label.offset=0.5, plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, skiptree=FALSE, show.tip.label=TRUE)

# Paint on the branch states
paint_stochastic_map_branches(res=resmod, master_table_cladogenetic_events=master_table_cladogenetic_events, colors_list_for_states=colors_list_for_states, lwd=5, lty=par("lty"), root.edge=TRUE, stratified=stratified)

plot_BioGeoBEARS_results(results_object=resmod, analysis_titletxt="Stochastic map", addl_params=list("j"), plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, skiptree=TRUE, show.tip.label=TRUE)

############################################
# Close PDF
############################################
dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

#######################################################
# Plot all 50 stochastic maps to PDF
#######################################################
# Setup
include_null_range = include_null_range
areanames = areanames
areas = areanames
max_range_size = max_range_size
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=include_null_range)
colors_list_for_states = get_colors_for_states_list_0based(areanames=areanames, states_list_0based=states_list_0based, max_range_size=max_range_size, plot_null_range=TRUE)
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
stratified = stratified

# Loop through the maps and plot to PDF
pdffn = paste0(model_name, "_", length(clado_events_tables), "BSMs_v1.pdf")
pdf(file=pdffn, width=6, height=6)

nummaps_goal = 50
for (i in 1:nummaps_goal)
    {
    clado_events_table = clado_events_tables[[i]]
    analysis_titletxt = paste0(model_name, " - Stochastic Map #", i, "/", nummaps_goal)
    plot_BSM(results_object=res, clado_events_table=clado_events_table, stratified=stratified, analysis_titletxt=analysis_titletxt, addl_params=list("j"), label.offset=0.5, plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, show.tip.label=TRUE, include_null_range=include_null_range)
    } # END for (i in 1:nummaps_goal)

dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

#######################################################
# Summarize stochastic map tables
#######################################################
length(clado_events_tables)
length(ana_events_tables)

head(clado_events_tables[[1]][,-20])
tail(clado_events_tables[[1]][,-20])

head(ana_events_tables[[1]])
tail(ana_events_tables[[1]])

areanames = names(tipranges@df)
actual_names = areanames
actual_names

# Get the dmat and times (if any)
dmat_times = get_dmat_times_from_res(res=res, numstates=NULL)
dmat_times

# Extract BSM output
clado_events_tables = BSM_output$RES_clado_events_tables
ana_events_tables = BSM_output$RES_ana_events_tables

# Simulate the source areas
BSMs_w_sourceAreas = simulate_source_areas_ana_clado(res, clado_events_tables, ana_events_tables, areanames)
clado_events_tables = BSMs_w_sourceAreas$clado_events_tables
ana_events_tables = BSMs_w_sourceAreas$ana_events_tables

# Count all anagenetic and cladogenetic events
counts_list = count_ana_clado_events(clado_events_tables, ana_events_tables, areanames, actual_names)

summary_counts_BSMs = counts_list$summary_counts_BSMs
print(conditional_format_table(summary_counts_BSMs))

# Histogram of event counts
hist_event_counts(counts_list, pdffn=paste0(model_name, "_histograms_of_event_counts.pdf"))

#######################################################
# Check that ML ancestral state/range probabilities and
# the mean of the BSMs approximately line up
#######################################################
library(MultinomialCI)    # For 95% CIs on BSM counts
check_ML_vs_BSM(res, clado_events_tables, model_name, tr=NULL, plot_each_node=FALSE, linreg_plot=TRUE, MultinomialCI=TRUE)

Interpreting the outputs of Biogeographic Stochastic Mapping

(link to this section)

Interpreting the output of Biogeographic Stochastic Mapping. I had thought that the BSM script was fairly self-explanatory, but based on the questions I have gotten, it seems that many users are overwhelmed by the amount of stuff that prints to the screen. Here is the key: the important stuff for the user is at the very end of the script.

Here is the screen output that you get if you run the above script through to completion. Notice that the key function, doing all of the summary work for you, is count_ana_clado_events():

> clado_events_tables = BSMs_w_sourceAreas$clado_events_tables
> ana_events_tables = BSMs_w_sourceAreas$ana_events_tables
> 
> # Count all anagenetic and cladogenetic events
> counts_list = count_ana_clado_events(clado_events_tables, ana_events_tables, areanames, actual_names)

count_ana_clado_events() is counting events over Biogeographical Stochastic Map (BSM) #:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 

Range-switching dispersal (all observed 'a' dispersals):

means:
  K O M H
K 0 0 0 0
O 0 0 0 0
M 0 0 0 0
H 0 0 0 0

standard deviations:
  K O M H
K 0 0 0 0
O 0 0 0 0
M 0 0 0 0
H 0 0 0 0

Range-expansion dispersal (all observed 'd' dispersals):

means:
  K     O    M    H
K 0  0.12 0.50 0.48
O 0     0 0.54 0.52
M 0     0    0    0
H 0 0.020    0    0

standard deviations:
  K    O    M    H
K 0 0.39 0.51 0.50
O 0    0 0.50 0.50
M 0    0    0    0
H 0 0.14    0    0

Anagenetic dispersal (mean of all observed anagenetic 'a' or 'd' dispersals):

means:
  K     O    M    H
K 0  0.12 0.50 0.48
O 0     0 0.54 0.52
M 0     0    0    0
H 0 0.020    0    0

standard deviations:
  K    O    M    H
K 0 0.39 0.51 0.50
O 0    0 0.50 0.50
M 0    0    0    0
H 0 0.14    0    0

Cladogenetic dispersal (mean of all observed jump 'j' dispersals):

means:
     K    O    M H
K    0 3.06    0 0
O 0.26    0 1.98 0
M    0    0    0 0
H    0    0    0 0

standard deviations:
     K    O    M H
K    0 0.47    0 0
O 0.53    0 0.14 0
M    0    0    0 0
H    0    0    0 0

ALL dispersal (mean of all observed anagenetic 'a', 'd' dispersals, PLUS cladogenetic founder/jump dispersal):

means:
     K     O    M    H
K    0  3.18 0.50 0.48
O 0.26     0 2.52 0.52
M    0     0    0    0
H    0 0.020    0    0

standard deviations:
     K    O    M    H
K    0 0.39 0.51 0.50
O 0.53    0 0.50 0.50
M    0    0    0    0
H    0 0.14    0    0

Summary of event counts from 50 BSMs:
       founder a    d e subset vicariance sympatry ALL_disp ana_disp all_ana all_clado
means      5.3 0 2.18 0  0.080      0.080    12.54     7.48     2.18    2.18        18
stdevs    0.61 0 0.44 0   0.34       0.27     0.65     0.61     0.44    0.44         0
sums       265 0  109 0      4          4      627      374      109     109       900
       total_events
means         20.18
stdevs         0.44
sums           1009

A series of dispersal count matrices are printed to screen. In the dispersal count matrices, the row headings represent "from" (source areas), and the column headings represent "to" (destination areas).

The mean is the average number of dispersal events of each type, across 50 stochastic maps (or however many BSMs you ran). The second matrix is the standard deviation of these counts. Multiply the standard deviation by +/-1.96 to get the theoretical 95% confidence interval (if the histograms are normal distributions; if non-normal, doing a Bayesian credibility interval, using percentiles on the actual list of counts, would be better).

The first pair of matrices is for range-switching dispersal. Range-switching dispersal is controlled by the parameter 'a', which is fixed to 0 in DEC and related models, so all of these counts are zero.

The second pair of matrices is for for range-expansion dispersal. Range-switching dispersal is controlled by the parameter 'd', which is freely estimated in DEC and related models. If the d parameter is 0, these counts are zero, but otherwise there should be a few events. (In this particular DEC+J model on the time-stratified Hawaiian islands, there are only a few 'd' events, as founder-event dispersal dominates.)

The third pair of matrices adds up both types of anagenetic dispersal. Since 'a' was zero, this equals the 'd' matrices.

The fourth pair of matrices is for for founder-event/jump dispersal (which is cladogenetic). Founder-event/jump dispersal is controlled by the parameter 'j', which is freely estimated in DEC+J and related models. If the j parameter is 0, these counts are zero, but otherwise there should be a some j events. (In this particular DEC+J model on the time-stratified Hawaiian islands, j is quite a strong explainer of the data, as founder-event dispersal dominates.)

The fifth pair of matrices adds up ALL types of dispersal, both anagenetic and cladogenetic.

The last item printed to screen is "Summary of event counts from 50 BSMs". This contains the means and standard deviations of all event types. Readers will note that column "all_clado" (the sum of all cladogenetic events) has a mean of 18 events, with a standard deviation of 0. This is because there are 18 nodes in the Psychotria phylogeny, thus there are always exactly 18 cladogenetic events in every single stochastic map. This means, of course, that sympatric range-copying (e.g. K->K,K) is being counted as an event.

The row "sums" adds up all of the events across the 50 stochastic maps. Dividing the sums by 50 produces the row "means".

Accessing all of the summary tables

(link to this section)

Accessing all of the summary tables. All of these tables, as well as the raw counts per stochastic map, are stored in the object counts_list. You can see all of these items with names(counts_list). You can access individual elements with e.g. counts_list$founder_totals_list.

> names(counts_list)
 [1] "all_dispersals_counts_cube"         "anagenetic_dispersals_counts_cube" 
 [3] "founder_counts_cube"                "ana_counts_cube"                   
 [5] "a_counts_cube"                      "d_counts_cube"                     
 [7] "e_counts_rectangle"                 "unique_sub_counts"                 
 [9] "unique_vic_counts"                  "unique_sym_counts"                 
[11] "all_dispersals_totals_list"         "anagenetic_dispersals_totals_list" 
[13] "founder_totals_list"                "ana_totals_list"                   
[15] "a_totals_list"                      "d_totals_list"                     
[17] "e_totals_list"                      "subsetSymp_totals_list"            
[19] "vicariance_totals_list"             "sympatry_totals_list"              
[21] "clado_totals_list"                  "all_totals_list"                   
[23] "all_dispersals_counts_fromto_means" "ana_dispersals_counts_fromto_means"
[25] "founder_counts_fromto_means"        "a_counts_fromto_means"             
[27] "d_counts_fromto_means"              "all_dispersals_counts_fromto_sds"  
[29] "ana_dispersals_counts_fromto_sds"   "founder_counts_fromto_sds"         
[31] "a_counts_fromto_sds"                "d_counts_fromto_sds"               
[33] "summary_counts_BSMs"

> counts_list$founder_totals_list
 [1] 5 6 6 5 4 4 5 4 5 5 6 5 6 6 5 5 5 6 6 6 6 5 5 5 5 5 5 5 5 6 5 6 6 5 5 5 6 5 5 6 5 6 5 5 5 5
[47] 6 5 7 5

Accessing the tables storing all of the actual stochastic maps

(link to this section)

Accessing the tables storing all of the actual stochastic maps. If you run the script all of the way through to the end, the actual stochastic maps (exact event histories) are saved in these two lists of tables: clado_events_tables, and ana_events_tables.

The object clado_events_tables is a list of 50 tables, containing the cladogenetic range-inheritance events at each nodes. The object ana_events_tables is a list of 50 tables, containing the anagenetic events, on branches where anagenetic events occurred.

(Note that it is perfectly possible to have a stochastic map where no anagenetic events occur, particularly in smaller datasets and those with a strong j weight parameter.)

Readings lists and tables in R

(link to this section)

Accessing these tables requires some basic R skillz. I cannot teach these to everyone individually. If you are totally at sea with R, you should probably learn basic R before attempting to manipulate tables.

But, just you get you started, this is how you would access the third table in the list of 50 cladogenetic events tables: clado_events_tables[[3]]. This could be a huge table, so it could be useful to just look at the top or bottom of the table, with head() or tail().

# First few rows of the third clado_events_table:
> head(clado_events_tables[[3]])
    node node.type parent_br edge.length ancestor daughter_nds   time_bp fossils
110    1       tip         8    0.965685       27              0.0008898   FALSE
21     2       tip         9    0.965685       27              0.0008898   FALSE
31     3       tip        11    1.230219       28              0.0008898   FALSE
41     4       tip        12    1.230219       28              0.0008898   FALSE
51     5       tip        14    1.851022       29              0.0000000   FALSE
61     6       tip        15    1.851022       29              0.0000000   FALSE
                       label stratum time_top time_bot reltimept piecenum piececlass SUBnode
110 P_hawaiiensis_WaikamoiL1       1        0      0.5       0.5        1  subbranch       1
21           P_mauiensis_Eke       1        0      0.5       0.5        2  subbranch       2
31                P_fauriei2       1        0      0.5       0.5        3  subbranch       3
41             P_hathewayi_1       1        0      0.5       0.5        4  subbranch       4
51      P_kaduana_PuuKukuiAS       1        0      0.5       0.5        5  subbranch       5
61        P_mauiensis_PepeAS       1        0      0.5       0.5        6  subbranch       6
    SUBnode.type SUBparent_br SUBedge.length SUBancestor SUBdaughter_nds SUBtime_bp SUBfossils
110          tip            8       0.965685          27                  0.0008898      FALSE
21           tip            9       0.965685          27                  0.0008898      FALSE
31           tip           11       1.230219          28                  0.0008898      FALSE
41           tip           12       1.230219          28                  0.0008898      FALSE
51           tip           14       1.851022          29                  0.0000000      FALSE
61           tip           15       1.851022          29                  0.0000000      FALSE
                    SUBlabel clado_event_type clado_event_txt clado_dispersal_from
110 P_hawaiiensis_WaikamoiL1             <NA>            <NA>                     
21           P_mauiensis_Eke             <NA>            <NA>                     
31                P_fauriei2             <NA>            <NA>                     
41             P_hathewayi_1             <NA>            <NA>                     
51      P_kaduana_PuuKukuiAS             <NA>            <NA>                     
61        P_mauiensis_PepeAS             <NA>            <NA>                     
    clado_dispersal_to sampled_states_AT_nodes sampled_states_AT_brbots left_desc_nodes
110               <NA>                       4                        4              NA
21                <NA>                       4                        4              NA
31                <NA>                       3                        3              NA
41                <NA>                       3                        3              NA
51                <NA>                       4                        4              NA
61                <NA>                       4                        4              NA
    right_desc_nodes samp_LEFT_dcorner samp_RIGHT_dcorner anagenetic_events_txt_below_node
110               NA                NA                 NA                             none
21                NA                NA                 NA                             none
31                NA                NA                 NA                             none
41                NA                NA                 NA                             none
51                NA                NA                 NA                             none
61                NA                NA                 NA                             none

# Last few rows of the third clado_events_table:
> tail(clado_events_tables[[3]])
    node node.type parent_br edge.length ancestor daughter_nds  time_bp fossils    label stratum
211   21  internal         1   0.2873120       20       22, 35 4.912688      NA inNode21       4
221   22  internal         2   0.7345186       21       23, 31 4.178169      NA inNode22       4
361   36  internal        32   2.3720812       20       37, 19 2.827919      NA inNode36       4
212   21  internal         1   0.2873120       20       22, 35 4.912688      NA inNode21       5
362   36  internal        32   2.3720812       20       37, 19 2.827919      NA inNode36       5
20    20      root        NA          NA       NA       21, 36 5.200000      NA inNode20       5
    time_top time_bot reltimept piecenum piececlass SUBnode SUBnode.type SUBparent_br
211      3.7      5.1       1.4        1    subtree       4         root           NA
221      3.7      5.1       1.4        1    subtree       5     internal            1
361      3.7      5.1       1.4        2  subbranch       4          tip            6
212      5.1     10.0       4.9        1    subtree       1          tip            1
362      5.1     10.0       4.9        1    subtree       2          tip            2
20       5.1     10.0       4.9        1    subtree       3         root           NA
    SUBedge.length SUBancestor SUBdaughter_nds SUBtime_bp SUBfossils
211      0.1873120          NA            5, 3  1.2126880         NA
221      0.7345186           4            1, 2  0.4781694         NA
361      1.5000000           5                  0.0000000      FALSE
212      0.1000000           3                  0.0000000      FALSE
362      0.1000000           3                  0.0000000      FALSE
20              NA          NA            1, 2  0.1000000         NA
                                                                                                                                                                                                                                                                                            SUBlabel
211                                                                                                                                                                                                                                                                                          inNode4
221                                                                                                                                                                                                                                                                                          inNode5
361                                                                                                                                                                                                                                                       P_hexandra_K1,P_hexandra_M,P_hexandra_Oahu
212 P_fauriei2,P_hathewayi_1,P_hawaiiensis_WaikamoiL1,P_kaduana_PuuKukuiAS,P_mauiensis_Eke,P_mauiensis_PepeAS,P_greenwelliae07,P_greenwelliae907,P_kaduana_HawaiiLoa,P_grandiflora_Kal2,P_hobdyi_Kuia,P_hawaiiensis_Makaopuhi,P_mariniana_Kokee2,P_mariniana_MauiNui,P_mariniana_Oahu,P_wawraeDL7428
362                                                                                                                                                                                                                                                       P_hexandra_K1,P_hexandra_M,P_hexandra_Oahu
20                                                                                                                                                                                                                                                                                           inNode3
    clado_event_type clado_event_txt clado_dispersal_from clado_dispersal_to
211     sympatry (y)          K->K,K                                        
221     sympatry (y)          K->K,K                                        
361             <NA>            <NA>                                    <NA>
212                                                                         
362                                                                         
20      sympatry (y)          K->K,K                                        
    sampled_states_AT_nodes sampled_states_AT_brbots left_desc_nodes right_desc_nodes
211                       2                        2               5                3
221                       2                        2               1                2
361                       2                        2              NA               NA
212                       2                        2              NA               NA
362                       2                        2              NA               NA
20                        2                       NA               1                2
    samp_LEFT_dcorner samp_RIGHT_dcorner anagenetic_events_txt_below_node
211                 2                  2                             none
221                 2                  2                             none
361                NA                 NA                             none
212                NA                 NA                             none
362                NA                 NA                             none
20                  2                  2                             <NA>

# Names of the columns of one of the clado_events_table:
> names(clado_events_tables[[15]])
 [1] "node"                             "node.type"                       
 [3] "parent_br"                        "edge.length"                     
 [5] "ancestor"                         "daughter_nds"                    
 [7] "time_bp"                          "fossils"                         
 [9] "label"                            "stratum"                         
[11] "time_top"                         "time_bot"                        
[13] "reltimept"                        "piecenum"                        
[15] "piececlass"                       "SUBnode"                         
[17] "SUBnode.type"                     "SUBparent_br"                    
[19] "SUBedge.length"                   "SUBancestor"                     
[21] "SUBdaughter_nds"                  "SUBtime_bp"                      
[23] "SUBfossils"                       "SUBlabel"                        
[25] "clado_event_type"                 "clado_event_txt"                 
[27] "clado_dispersal_from"             "clado_dispersal_to"              
[29] "sampled_states_AT_nodes"          "sampled_states_AT_brbots"        
[31] "left_desc_nodes"                  "right_desc_nodes"                
[33] "samp_LEFT_dcorner"                "samp_RIGHT_dcorner"              
[35] "anagenetic_events_txt_below_node"

# Find out what cladogenetic event happened at the root node (node 20) of the 23rd cladogenetic events table:
# Which row has node 20?  Use a TRUE/FALSE (TF) statement:
rootnode_row_TF = clado_events_tables[[23]]$node == 20

# print that row to screen:
> clado_events_tables[[23]][rootnode_row_TF,]
   node node.type parent_br edge.length ancestor daughter_nds time_bp fossils    label stratum
20   20      root        NA          NA       NA       21, 36     5.2      NA inNode20       5
   time_top time_bot reltimept piecenum piececlass SUBnode SUBnode.type SUBparent_br
20      5.1       10       4.9        1    subtree       3         root           NA
   SUBedge.length SUBancestor SUBdaughter_nds SUBtime_bp SUBfossils SUBlabel clado_event_type
20             NA          NA            1, 2        0.1         NA  inNode3     sympatry (y)
   clado_event_txt clado_dispersal_from clado_dispersal_to sampled_states_AT_nodes
20          K->K,K                                                               2
   sampled_states_AT_brbots left_desc_nodes right_desc_nodes samp_LEFT_dcorner
20                       NA               1                2                 2
   samp_RIGHT_dcorner anagenetic_events_txt_below_node
20                  2                             <NA>

# print the actual event (several equivalent methods):
> clado_events_tables[[23]]$clado_event_txt[rootnode_row_TF]
[1] "K->K,K"

> clado_events_tables[[23]]$clado_event_txt[75]
[1] "K->K,K"

# (since I happen to know in this case, that the root node is on row 75, and the column "clado_event_txt" is column 26
> clado_events_tables[[23]][75,26]
[1] "K->K,K"

The meanings of the column headings of Biogeographic Stochastic Mapping events tables

(link to this section)

I am out of time to do a full tutorial, but here is a quick description of what each column in these events tables is supposed to mean.

NOTE: REQUIRED BACKGROUND is an understanding of node numbers in APE's phylo objects. See here:

Node numbering in APE/BioGeoBEARS
http://phylo.wikidot.com/example-biogeobears-scripts#node_numbering

That link also discusses the BioGeoBEARS function prt() (print tree to table), which is also necessary for understanding the BSM event tables output.

The column headings that come from prt() and chainsaw2()

(link to this section)

#######################################################
# Reading BSM output
#######################################################
# The key outputs are:
# 
# 1. The anagenetic events table:
head(ana_events_tables[[1]])

# 2. The cladogenetic events table:
tail(clado_events_tables[[1]][,-20])

# The first 9 columns of these tables are information about the 
# nodes and branch of the tree. These columns are output
# by the BioGeoBEARS function prt() ("print tree to table").
# See here for a mini-tutorial about node numbers and branch 
# numbers in APE/R, and about prt():
# http://phylo.wikidot.com/example-biogeobears-scripts#node_numbering

# MEANINGS OF PRT() columns:

# (Note: Like biologists/paleontologists, we are assuming tips are the top 
#  of the tree, and the root is at the bottom. Computer scientists often
#  do it backwards.)
#
# node         - The node number used in the standard APE phylo object
# node.type    - Is the node "tip", "internal", or "root"?
# parent_br    - What is the branch number for the branch below the node?
#                (parent_br gives you the row of tr$edge and tr$edge.length)
# edge.length  - The branch length
# ancestor     - The node number at the bottom of the branch below the node
# daughter_nds - The node numbers of the daughters above the node
# time_bp      - The time-before-present of the node (where the highest tip is age 0)
# fossils      - If a tip, is it a fossil tip?  (as determined by the 
#                fossils_older_than option of prt()
# label        - The node name. Equals the tip species if a tip node, or "inNode"+
#                node number, if internal. 
# tipnames     - (optional output) If get_tipnames=TRUE in prt(), this column
#                contains the alphabetical, comma-delimited list of all tip species
#                that descend from that node. This is useful as a unique
#                identifier of a clade or bipartition across trees, which can be a 
#                huge problem in APE/R (which can relabel node numbers after any
#                change, rotation, etc.)

# COLUMNS DEALING WITH TIME-STRATIFICATION

# Background: A time-stratified tree, in R, is a nightmare. Imagine taking a
#             phylogeny, then taking a chainsaw to it, cutting across it
#             at several different timepoints. Now you have a pile of "tree pieces";
#             some of these will be subtrees (almost always with branches below the 
#             subtree root node), and some will just be segments (or "1-branch trees",
#             if you like).
#             
#             Keeping track of all of this in R pretty much sucks, as the code 
#             has to pass all of the likelihoods up and down, keep track of which
#             branches connect to what, have special code for single-branch segments,
#             more special code for fossil branches that don't come up to 0, etc.
#             This would all be much easier with an object-oriented approach, but,
#             well, I'm stuck with this unless/until I reprogram everything from
#             scratch in a new language. (Yes, I should have listened to my advisor
#             and done everything in C++, but then we wouldn't have all the nice 
#             graphics, and I might still be fighting with compilers on some Windows32
#             machine instead of doing science.)
#            
#             Anyhow, the tree-pieces are generated by the BioGeoBEARS function chainsaw2().
#             This generates a list of tree-pieces. To keep track of all of this I have a 
#             big table, which includes the information from prt(), and then has
#             similar information for each tree-piece.

# These columns track which stratum a tree piece is in, the beginning and ending of 
# that stratum, the relative time point of the node within that stratum, and 
# number and type of the tree-piece in the tree-pieces list produced by chainsaw2():
# 
# stratum
# time_top
# time_bot
# reltimept
# piecenum
# piececlass
# 
#
# The "SUB" columns contain the prt() output, but for the subtree pieces:
# SUBnode
# SUBnode.type
# SUBparent_br
# SUBedge.length
# SUBancestor
# SUBdaughter_nds
# SUBtime_bp
# SUBfossils
# SUBlabel
#

The meanings of the column headings of clado_events_tables

(link to this section)

# COLUMNS RECORDING CLADOGENETIC RANGE-INHERITANCE EVENTS
# clado_event_type                   - What type of event at cladogenesis? (and letter
#                                      for the parameter that BioGeoBEARS uses to control
#                                      the process; see Figure 1 of Matzke (2013, Frontiers
#                                      in Biogeography).
#                                      sympatry (y)
#                                      vicariance (v)
#                                      subset sympatry (s)
#                                      founder-event/jump dispersal (j)
# clado_event_txt                    - What specific event happened?  E.g.:
#                                      K->K,K would be a narrow sympatry event
#                                      ABC->ABC,ABC would be a broad sympatry event
#                                      ABC->A,BC would a vicariance event, etc.
#                                      (NOTE: This is one reason SINGLE-LETTER CODES
#                                             are REQUIRED by BSM)
# clado_dispersal_to                 - *IF* it's a dispersal event, dispersal to where?
# sampled_states_AT_nodes            - What's the state number of the range at the node?
#                                      (1-based index: typically,
#                                       1=null range
#                                       2=A
#                                       3=B
#                                       4=AB ...for a 2-area system)
# sampled_states_AT_brbots           - State number at the bottom of the branch below the node
# left_desc_nodes                    - State number at top of the left descendant branch
# right_desc_nodes                   - State number at top of the right descendant branch
# samp_LEFT_dcorner                  - State number just after speciation of the node
#                                      (bottom of the left descendant branch)
# samp_RIGHT_dcorner                 - State number just after speciation of the node
#                                      (bottom of the right descendant branch)
# anagenetic_events_txt_below_node   - For the branch below the node, records the 
#                                      events along the branch, in a plain-text, semicolon-
#                                      delimited format. Use the function 
#                                      events_txt_list_into_events_table() to convert
#                                      these to a table of events (as in the object 
#                                      ana_events_tables)
#

The meanings of the column headings of ana_events_tables

(link to this section)

# COLUMNS RECORDING ANAGENETIC RANGE-INHERITANCE EVENTS
# 
# NOTE CAREFULLY: An ana_events_table include cladogenetic information, because it is
# copied from the clado_events_table when clado_events_table$anagenetic_events_txt_below_node
# is converted into an ana_events_table.  This helps us to keep track of the states at
# the top and bottom of each branch. However, this information will be COPIED to 
# every anagenetic event. So, if there are several events on branch, the same cladogenetic
# information will be copied to every row of the anagenetic events table. SHORT VERSION:
# DON'T COUNT CLADOGENETIC EVENTS FROM THE ANAGENETIC EVENTS TABLE; THAT IS WHAT THE
# CLADOGENETIC EVENTS TABLE IS FOR!
#
# COLUMNS SPECIFIC TO AN ana_events_table (the others are as above)
#
# trynum                    - The number of tries that occurred before a successful
#                             simulation of this branch history.
#                             (If trynum is greater than the user-set maxtries, then
#                              the history was created with the so-called "manual" 
#                              option that finds the minimum events and puts them 
#                              in random order; see the BioGeoBEARS Google Group.)
# brlen                     - The branch length (or length of the segment, when time-
#                              stratified).
# current_rangenum_1based    - The 1-based index of the range (the state) before the event.
# new_rangenum_1based        - The 1-based index of the range (the state) after the event.
# current_rangetxt          - The text version of the range before the event.
# new_rangetxt              - The text version of the range after the event.
# abs_event_time            - The absolute time of the event (time before the highest 
#                             tip in the overall tree, which marks time=0 my before 
#                             present.
# event_time                - Relative event time (within the time stratum).
# event_type                - Type of event (currently allowed anagenetic events are
#                             d, e, and sometimes a).
# event_txt                 - Event text, eg. A->AB or ABC->BC.
# new_area_num_1based       - The 1-based index of the new area (note: area indexes are
#                             different than range indexes!) for a d or a event.
# lost_area_num_1based      - The 1-based index for a lost area for an e event.
# ana_dispersal_from        - The source area (important when there are multiple
#                             possible areas, e.g. ABC -> ABCF. The source area is
#                             probabilistically chosen according to the dispersal
#                             constraints etc. used in the model. The function doing this
#                             is simulate_source_areas_ana_clado(), run near the end of
#                             the example script.
# dispersal_to              - The area being dispersed to (text version).
# extirpation_from          - The area being lost (text version).
#

How to convert multi-page PDFs to animations

(link to this section)

In terms of producing nice animated GIFs for websites and presentations, I have had the best luck (on my Mac OS X) with ImageMagick: http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=25640 .

On Macs at least, ImageMagick's "convert" function will require that you have ghostscript (gs) installed (http://ghostscript.com/download/gsdnld.html). If you use MacPorts to install ImageMagick, ghostscript should come along with that.

For example, in Terminal, navigate to the directory with your PDF, then:

convert -delay 20 -density 100x100 -dispose previous DEC+J_M3b_strat_50BSMs_v1.pdf -layers coalesce DEC+J_M3b_strat_50BSMs_v1_100x100.gif

I have found that 200x200 is already a pretty big file. Going very high (e.g. 1000x1000) seems to freeze my computer etc.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License