Table of Contents
|
Introduction
I have now added Biogeographical Stochastic Mapping (BSM) to my basic collection of BioGeoBEARS code, now that it has been validated (see BioGeoBEARS validation).
When I get a chance to do a CRAN update (a major project, as I have to thoroughly revise the documentation etc.), it will be available there as well. For now, you should source() the full collection of update R files in the example script (which you should do anyway, after each time you invoke library(BioGeoBEARS)).
See my introduction and example animation graphics for Biogeographical Stochastic Mapping here: Stochastic mapping under biogeographical models (updated April 2015).
Citation of Biogeographical Stochastic Mapping
Note 2016: The best paper to cite regarding Biogeographical Stochastic Mapping is Dupin et al. (2016, //Journal of Biogeography//), where we first describe the basics of the method, and its utility. Eventually there may be a more specialist application note on the method, but this is behind several other projects.
Citation of Biogeographical Stochastic Mapping:
1. Dupin, Julia; Matzke, Nicholas J.; Sarkinen, Tiina; Knapp, Sandra; Olmstead, Richard; Bohs, Lynn; Smith, Stacey (2016).
"Bayesian estimation of the global biogeographic history of the Solanaceae. Journal of Biogeography, 44(4), 887-899.
http://dx.doi.org/10.1111/jbi.128982. Matzke, Nicholas J. (2016). "Stochastic mapping under biogeographical models." PhyloWiki BioGeoBEARS website, 2016,
http://phylo.wikidot.com/biogeobears#stochastic_mapping.
Clarifications about Biogeographic Stochastic Mapping (BSM)
Brief notes (hopefully written up more later):
- Because of the programs SIMMAP and phytools, some people think of stochastic mapping as an alternate model to ancestral range inference, or they think that Biogeographic Stochastic Mapping is the same thing as what SIMMAP and phytools are doing. In reality:
- I think it is useful to distinguish models, methods, and programs. The model consists of the transition matrices, rates, etc. SIMMAP and phytools are (typically, at least) working with unordered characters (they may do other things also). If used for geography, this assumes (a) nothing can ever live in more than one area, (b) the only kind of dispersal is "range-switching" (e.g., sudden jump from A->B along a branch), (c) nothing happens at cladogenesis (or, rather, the range is copied exactly to both descendants: A->A,A).
- This (or any other) model can then be used to (a) calculate the log-likelihood of the data (for ML or Bayesian parameter inference); (b) calculate the probability of ancestral states under the model (ancestral area reconstruction, or, better, ancestral range estimation); (c) stochastic mapping (for event counts, residence times, etc.) These are different methods that make use of one or more models.
- The handy thing about Biogeographic Stochastic Mapping (BSM) in BioGeoBEARS is that it is programmed to do the stochastic mapping method (and ML ancestral states inference methods) on any of the programmed biogeographical models.
- All of the programs can do the simpler models, but only BioGeoBEARS can do the complex biogeographic models. If you do want to run a simple standard character in BioGeoBEARS, this is totally possible. I call it the "BAYAREALIKE+a-d-e" model, with max_range_size=1 (I use this to do validation of BioGeoBEARS here: BioGeoBEARS Validation). So, BioGeoBEARS, SIMMAP, and phytools are three programs that can all run the stochastic mapping method on the standard unordered character model.
Biggest caveats
The biggest caveats with Biogeographic Stochastic Mapping are those common to all available biogeographic models, namely:
- Even the best model might be wrong. In fact, it almost certainly is. "All models are wrong, but some models are useful." The event counts represent estimates of the number of events conditional on the model, the estimated model parameters, the phylogeny, and the data.
- The biggest worrisome assumption of all of these biogeographic models is that they treat the observed tree as the true tree, and thus they leave out extinction (and unsampled species, if those exist). This is a problem with almost all phylogenetic models (SSE models attempt to model extinction, but one can argue if they do it well), but might be worse for biogeographic models, since cladogenetic range-change processes are being postulated, so missing/extinct species mean missing cladogenetic events in the tree. The main thing missing-extinction-events means for event counts is that event counts are probably underestimates. Mostly what we are hoping is that we are at least getting the minimum number of events, and that these counts will correlate with the true events.
- If you have hugely biased extinction or sampling, then of course this will drastically effect results. This is common to all phylogenetic methods (a phylogeny of only large mammals would estimate that the common ancestor of mammals was large).
Note: Only *single*-letter codes for areas!
Note: The stochastic mapping code assumes that all of the areas have single-letter codes — without that assumption, there is no efficient way to store/list ranges of multiple areas. So: *always* use single-letter codes for areas, if you ever plan to do stochastic mapping.
Example Biogeographical Stochastic Mapping (BSM) graphics
Here are some graphics produced by a script like the example BSM script below.
Most probable states/ranges (time-stratified DEC and DEC+J)
Figure Caption: Plot of the single-most probable ancestral state/range for a time-stratified DEC model on Hawaiian Psychotria. Note that the single-most probable range at a particular node could still have very low probability; pie charts (below) should be consulted to get a sense of this. | Figure Caption: Plot of the single-most probable ancestral state/range for a time-stratified DEC+J model on Hawaiian Psychotria. Note that the single-most probable range at a particular node could still have very low probability; pie charts (below) should be consulted to get a sense of this. |
Pie charts of states/ranges probabilities (time-stratified DEC and DEC+J)
Figure Caption: Pie charts showing ancestral state/range probabilities for a time-stratified DEC model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time. | Figure Caption: Pie charts showing ancestral state/range probabilities for a time-stratified DEC+J model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time. |
Single stochastic map plot (time-stratified DEC and DEC+J)
Figure Caption: Plot of a single Biogeographical Stochastic Map (BSM) for a time-stratified DEC model on Hawaiian Psychotria. Note that a single BSM is just a single realization of a possible history. A sample of BSMs should be examined to determine the variability in possible histories. | Figure Caption: Plot of a single Biogeographical Stochastic Map (BSM) for a time-stratified DEC+J model on Hawaiian Psychotria. Note that a single BSM is just a single realization of a possible history. A sample of BSMs should be examined to determine the variability in possible histories. |
Animation of 50 BSMs (time-stratified DEC and DEC+J)
Figure Caption: Animation of 50 Biogeographical Stochastic Maps (BSMs) for a time-stratified DEC model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time. | Figure Caption: Animation of 50 Biogeographical Stochastic Maps (BSMs) for a time-stratified DEC+J model on Hawaiian Psychotria. Here, areas disappear and become unavailable as islands sink beneath the ocean going backward in time. |
Histograms of event counts across 50 BSMs (time-stratified DEC and DEC+J)
Event counts under ML time-stratified DEC model | Event counts under ML time-stratified DEC+J model |
Figure Caption: Histograms of the counts of different kinds of events found in each of the 50 BSMs. The x-axis gives the number of events, the y-axis gives the number of BSMs in which a specific number of events was observed. Model is a time-stratified DEC model on Hawaiian Psychotria. | Figure Caption: Histograms of the counts of different kinds of events found in each of the 50 BSMs. The x-axis gives the number of events, the y-axis gives the number of BSMs in which a specific number of events was observed. Model is a time-stratified DEC+J model on Hawaiian Psychotria. |
Validation of BSMs against ML ancestral state probabilities (time-stratified DEC and DEC+J)
NOTE: Most of the BSM validations are found at: BioGeoBEARS Validation: BioGeoBEARS DEC and DEC+J -- ancestral state/range probabilities versus BSMs
Figure Caption: Ancestral state/range probabilities, versus the state probabilities estimated by taking the mean of 50 BSMs, for a time-stratified DEC model on Hawaiian Psychotria. | Figure Caption: Ancestral state/range probabilities, versus the state probabilities estimated by taking the mean of 50 BSMs, for a time-stratified DEC+J model on Hawaiian Psychotria. |
Non-time-stratified script for Biogeographical Stochastic Mapping (BSM) — run AFTER basic example script
The script below can be run, verbatim, after the standard BioGeoBEARS example script that runs DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, and BAYAREALIKE+J. This script only runs Biogeographical Stochastic Mapping on the DEC model, but it could be easily modified to run it under the other models.
NOTE: THIS IS NOT A COMPLETE SCRIPT BY ITSELF. IT ASSUMES YOU HAVE JUST RUN THE standard BioGeoBEARS example script, which covers the source() commands, the working directories, etc.
I am including this run-BSM-after-the-example-script, because many users run the basic example script, then try running the time-stratified BSM example script, which won't work without a few minor modifications (to change the time-stratified BSM script into the non-time-stratified BSM script, the modifications are basically: change the working directory, skip the ML runs if you have already done those, then set model_name, set res=resDEC or whatever model you have saved, and set stratified=FALSE).
model_name = "DEC"
res = resDEC
pdffn = paste0("Psychotria_", model_name, "_v1.pdf")
pdf(pdffn, height=6, width=6)
analysis_titletxt = paste0(model_name, " on Psychotria")
# Setup
results_object = res
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
dev.off() # Turn off PDF
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr) # Plot it
#######################################################
# Stochastic mapping on DEC M3b stratified with islands coming up
#######################################################
clado_events_tables = NULL
ana_events_tables = NULL
lnum = 0
#######################################################
# Get the inputs for Biogeographical Stochastic Mapping
# Note: this can be slow for large state spaces and trees, since
# the independent likelihoods for each branch are being pre-calculated
# E.g., for 10 areas, this requires calculation of a 1024x1024 matrix
# for each branch. On a tree with ~800 tips and thus ~1600 branches, this was about 1.6 gigs
# for storage of "BSM_inputs_file.Rdata".
# Update: 2015-09-23 -- now, if you used multicore functionality for the ML analysis,
# the same settings will be used for get_inputs_for_stochastic_mapping().
#######################################################
BSM_inputs_fn = "BSM_inputs_file.Rdata"
runInputsSlow = TRUE
if (runInputsSlow)
{
stochastic_mapping_inputs_list = get_inputs_for_stochastic_mapping(res=res)
save(stochastic_mapping_inputs_list, file=BSM_inputs_fn)
} else {
# Loads to "stochastic_mapping_inputs_list"
load(BSM_inputs_fn)
} # END if (runInputsSlow)
# Check inputs (doesn't work the same on unconstr)
names(stochastic_mapping_inputs_list)
stochastic_mapping_inputs_list$phy2
stochastic_mapping_inputs_list$COO_weights_columnar
stochastic_mapping_inputs_list$unconstr
set.seed(seed=as.numeric(Sys.time()))
runBSMslow = TRUE
if (runBSMslow == TRUE)
{
# Saves to: RES_clado_events_tables.Rdata
# Saves to: RES_ana_events_tables.Rdata
# Bug check:
# stochastic_mapping_inputs_list=stochastic_mapping_inputs_list; maxnum_maps_to_try=100; nummaps_goal=50; maxtries_per_branch=40000; save_after_every_try=TRUE; savedir=getwd(); seedval=12345; wait_before_save=0.01; master_nodenum_toPrint=0
BSM_output = runBSM(res, stochastic_mapping_inputs_list=stochastic_mapping_inputs_list, maxnum_maps_to_try=100, nummaps_goal=50, maxtries_per_branch=40000, save_after_every_try=TRUE, savedir=getwd(), seedval=12345, wait_before_save=0.01, master_nodenum_toPrint=0)
RES_clado_events_tables = BSM_output$RES_clado_events_tables
RES_ana_events_tables = BSM_output$RES_ana_events_tables
} else {
# Load previously saved...
# Loads to: RES_clado_events_tables
load(file="RES_clado_events_tables.Rdata")
# Loads to: RES_ana_events_tables
load(file="RES_ana_events_tables.Rdata")
BSM_output = NULL
BSM_output$RES_clado_events_tables = RES_clado_events_tables
BSM_output$RES_ana_events_tables = RES_ana_events_tables
} # END if (runBSMslow == TRUE)
# Extract BSM output
clado_events_tables = BSM_output$RES_clado_events_tables
ana_events_tables = BSM_output$RES_ana_events_tables
head(clado_events_tables[[1]])
head(ana_events_tables[[1]])
length(clado_events_tables)
length(ana_events_tables)
include_null_range = TRUE
areanames = names(tipranges@df)
areas = areanames
max_range_size = 4
# Note: If you did something to change the states_list from the default given the number of areas, you would
# have to manually make that change here as well! (e.g., areas_allowed matrix, or manual reduction of the states_list)
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=include_null_range)
colors_list_for_states = get_colors_for_states_list_0based(areanames=areanames, states_list_0based=states_list_0based, max_range_size=max_range_size, plot_null_range=TRUE)
############################################
# Setup for painting a single stochastic map
############################################
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
stratified = FALSE
clado_events_table = clado_events_tables[[1]]
ana_events_table = ana_events_tables[[1]]
# cols_to_get = names(clado_events_table[,-ncol(clado_events_table)])
# colnums = match(cols_to_get, names(ana_events_table))
# ana_events_table_cols_to_add = ana_events_table[,colnums]
# anagenetic_events_txt_below_node = rep("none", nrow(ana_events_table_cols_to_add))
# ana_events_table_cols_to_add = cbind(ana_events_table_cols_to_add, anagenetic_events_txt_below_node)
# rows_to_get_TF = ana_events_table_cols_to_add$node <= length(tr$tip.label)
# master_table_cladogenetic_events = rbind(ana_events_table_cols_to_add[rows_to_get_TF,], clado_events_table)
############################################
# Open a PDF
############################################
pdffn = paste0(model_name, "_single_stochastic_map_n1.pdf")
pdf(file=pdffn, height=6, width=6)
# Convert the BSM into a modified res object
master_table_cladogenetic_events = clado_events_tables[[1]]
resmod = stochastic_map_states_into_res(res=res, master_table_cladogenetic_events=master_table_cladogenetic_events, stratified=stratified)
plot_BioGeoBEARS_results(results_object=resmod, analysis_titletxt="Stochastic map", addl_params=list("j"), label.offset=0.5, plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, skiptree=FALSE, show.tip.label=TRUE)
# Paint on the branch states
paint_stochastic_map_branches(res=resmod, master_table_cladogenetic_events=master_table_cladogenetic_events, colors_list_for_states=colors_list_for_states, lwd=5, lty=par("lty"), root.edge=TRUE, stratified=stratified)
plot_BioGeoBEARS_results(results_object=resmod, analysis_titletxt="Stochastic map", addl_params=list("j"), plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, skiptree=TRUE, show.tip.label=TRUE)
############################################
# Close PDF
############################################
dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)
#######################################################
# Plot all 50 stochastic maps to PDF
#######################################################
# Setup
include_null_range = include_null_range
areanames = areanames
areas = areanames
max_range_size = max_range_size
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=include_null_range)
colors_list_for_states = get_colors_for_states_list_0based(areanames=areanames, states_list_0based=states_list_0based, max_range_size=max_range_size, plot_null_range=TRUE)
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
stratified = stratified
# Loop through the maps and plot to PDF
pdffn = paste0(model_name, "_", length(clado_events_tables), "BSMs_v1.pdf")
pdf(file=pdffn, height=6, width=6)
nummaps_goal = 50
for (i in 1:nummaps_goal)
{
clado_events_table = clado_events_tables[[i]]
analysis_titletxt = paste0(model_name, " - Stochastic Map #", i, "/", nummaps_goal)
plot_BSM(results_object=res, clado_events_table=clado_events_table, stratified=stratified, analysis_titletxt=analysis_titletxt, addl_params=list("j"), label.offset=0.5, plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, show.tip.label=TRUE, include_null_range=include_null_range)
} # END for (i in 1:nummaps_goal)
dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)
#######################################################
# Summarize stochastic map tables
#######################################################
length(clado_events_tables)
length(ana_events_tables)
head(clado_events_tables[[1]][,-20])
tail(clado_events_tables[[1]][,-20])
head(ana_events_tables[[1]])
tail(ana_events_tables[[1]])
areanames = names(tipranges@df)
actual_names = areanames
actual_names
# Get the dmat and times (if any)
dmat_times = get_dmat_times_from_res(res=res, numstates=NULL)
dmat_times
# Extract BSM output
clado_events_tables = BSM_output$RES_clado_events_tables
ana_events_tables = BSM_output$RES_ana_events_tables
# Simulate the source areas
BSMs_w_sourceAreas = simulate_source_areas_ana_clado(res, clado_events_tables, ana_events_tables, areanames)
clado_events_tables = BSMs_w_sourceAreas$clado_events_tables
ana_events_tables = BSMs_w_sourceAreas$ana_events_tables
# Count all anagenetic and cladogenetic events
counts_list = count_ana_clado_events(clado_events_tables, ana_events_tables, areanames, actual_names)
summary_counts_BSMs = counts_list$summary_counts_BSMs
print(conditional_format_table(summary_counts_BSMs))
# Histogram of event counts
hist_event_counts(counts_list, pdffn=paste0(model_name, "_histograms_of_event_counts.pdf"))
#######################################################
# Print counts to files
#######################################################
tmpnames = names(counts_list)
cat("\n\nWriting tables* of counts to tab-delimited text files:\n(* = Tables have dimension=2 (rows and columns). Cubes (dimension 3) and lists (dimension 1) will not be printed to text files.) \n\n")
for (i in 1:length(tmpnames))
{
cmdtxt = paste0("item = counts_list$", tmpnames[i])
eval(parse(text=cmdtxt))
# Skip cubes
if (length(dim(item)) != 2)
{
next()
}
outfn = paste0(tmpnames[i], ".txt")
if (length(item) == 0)
{
cat(outfn, " -- NOT written, *NO* events recorded of this type", sep="")
cat("\n")
} else {
cat(outfn)
cat("\n")
write.table(conditional_format_table(item), file=outfn, quote=FALSE, sep="\t", col.names=TRUE, row.names=TRUE)
} # END if (length(item) == 0)
} # END for (i in 1:length(tmpnames))
cat("...done.\n")
#######################################################
# Check that ML ancestral state/range probabilities and
# the mean of the BSMs approximately line up
#######################################################
library(MultinomialCI) # For 95% CIs on BSM counts
check_ML_vs_BSM(res, clado_events_tables, model_name, tr=NULL, plot_each_node=FALSE, linreg_plot=TRUE, MultinomialCI=TRUE)
Time-stratified example script for Biogeographical Stochastic Mapping (BSM)
NOTE: SEE ALSO [link to this section NON-TIME-STRATIFIED BSM SCRIPT], ABOVE.
This script should "run out of the box", i.e. if you copy/paste into your R window. (On some machines, huge copy-pastes introduce errors, in that case, you have to copy-paste in moderate-sized chunks.) You could also save the code to a .R file and then run from the Terminal (from outside of R) with Rscript scriptfilename.R.
Notes:
- This is a time-stratified DEC analysis. To run other models, or a non-time-stratified model you will have to edit the model setup; see the setups of 6 basic models in the main BioGeoBEARS example script.
- As usual, you will have to change the script to specify your own working directory.
- The input files (in this case, for a time-stratified analysis of Psychotria, with Hawaiian islands disappearing back in time) should auto-load from the BioGeoBEARS extdata directories specified. Change these directory locations/file names to use your own files.
- Also, if you switch to a non-stratified analysis, be sure to change stratified=FALSE to stratified=TRUE.
Example BSM script below:
#######################################################
# This is an introductory example script for the
# R package "BioGeoBEARS" by Nick Matzke
#
# All scripts are copyright Nicholas J. Matzke,
# please cite if you use. License: GPL-3
# http://cran.r-project.org/web/licenses/GPL-3
#
# I am happy to answer questions at matzke@nimbios.org, but
# I am more happy to answer questions on the
# BioGeoBEARS google group
#
# The package is designed for ML and Bayesian inference
# of
#
# (a) ancestral geographic ranges, and
#
# (b) perhaps more importantly, models for the
# evolution of geographic range across a phylogeny.
#
# The example below implements and compares:
#
# (1) The standard 2-parameter DEC model implemented in
# the program LAGRANGE (Ree & Smith 2008); users will
# notice that the ML parameter inference and log-
# likelihoods are identical
#
# (2) A DEC+J model implemented in BioGeoBEARS, wherein
# a third parameter, j, is added, representing the
# relative per-event weight of founder-event / jump
# speciation events at cladogenesis events. The
# higher j is, the more probability these events have,
# and the less probability the standard LAGRANGE
# cladogenesis events have.
#
# (3) Some standard model-testing (LRT and AIC) is
# implemented at the end so that users may compare models
#
# (4) The script does similar tests of a DIVA-like model (Ronquist 1997)
# and a BAYAREA-like model (Landis, Matzke, Moore, & Huelsenbeck, 2013)
#
#######################################################
#######################################################
# Installing BioGeoBEARS from GitHub latest version
#######################################################
# CUT 2018: The old instructions to source() online upgrade .R files have been deleted,
# all updates are now on the GitHub version of the package, version 1.1+
#######################################################
# Paste the stuff below, INSIDE the single-quote (') marks
# but NOT the single-quote marks themselves.
install_cmds_that_work_as_of_2023='
# Installation-from-scratch commands
install.packages("devtools")
install.packages("ape")
install.packages("optimx")
install.packages("GenSA")
install.packages("rexpokit")
install.packages("cladoRcpp")
install.packages("snow")
install.packages("MultinomialCI")
library(devtools)
devtools::install_github(repo="nmatzke/BioGeoBEARS")
# NOTE: If you get a message like this
# * select "2. CRAN packages only" on "3. None"
# * If you get asked about "binaries" vs. "source", choose "binaries"
# (binaries are precompiled and easy to install; installing from source
# requires that your computer have the correct compilers, which can be
# challenging if you are not fairly expert)
' # END install_cmds_that_work_as_of_2023
#######################################################
#######################################################
#######################################################
# SETUP -- libraries/BioGeoBEARS updates
#######################################################
# Load the package (after installation, see above).
library(ape)
library(optimx) # optimx seems better than R's default optim()
library(GenSA) # GenSA seems better than optimx (but slower) on 5+ parameters,
# seems to sometimes fail on simple problems (2-3 parameters)
library(rexpokit)
library(cladoRcpp)
library(snow) # (if you want to use multicore functionality; some systems/R versions prefer library(parallel), try either)
library(parallel)
library(BioGeoBEARS)
#######################################################
# SETUP: YOUR WORKING DIRECTORY
#######################################################
# You will need to set your working directory to match your local system
# Note these very handy functions!
# Command "setwd(x)" sets your working directory
# Command "getwd()" gets your working directory and tells you what it is.
# Command "list.files()" lists the files in your working directory
# To get help on any command, use "?". E.g., "?list.files"
# Set your working directory for output files
# default here is your home directory ("~")
# Change this as you like
wd = np("~")
setwd(wd)
# Double-check your working directory with getwd()
getwd()
#######################################################
# SETUP: Extension data directory
#######################################################
# When R packages contain extra files, they are stored in the "extdata" directory
# inside the installed package.
#
# BioGeoBEARS contains various example files and scripts in its extdata directory.
#
# Each computer operating system might install BioGeoBEARS in a different place,
# depending on your OS and settings.
#
# However, you can find the extdata directory like this:
extdata_dir = np(system.file("extdata", package="BioGeoBEARS"))
extdata_dir
list.files(extdata_dir)
# "system.file" looks in the directory of a specified package (in this case BioGeoBEARS)
# The function "np" is just a shortcut for normalizePath(), which converts the
# path to the format appropriate for your system (e.g., Mac/Linux use "/", but
# Windows uses "\\", if memory serves).
# Even when using your own data files, you should KEEP these commands in your
# script, since the plot_BioGeoBEARS_results function needs a script from the
# extdata directory to calculate the positions of "corners" on the plot. This cannot
# be made into a straight up BioGeoBEARS function because it uses C routines
# from the package APE which do not pass R CMD check for some reason.
#######################################################
# SETUP: YOUR TREE FILE AND GEOGRAPHY FILE
#######################################################
# Example files are given below. To run your own data,
# make the below lines point to your own files, e.g.
# trfn = "/mydata/frogs/frogBGB/tree.newick"
# geogfn = "/mydata/frogs/frogBGB/geog.data"
#######################################################
# Phylogeny file
# Notes:
# 1. Must be binary/bifurcating: no polytomies
# 2. No negative branchlengths (e.g. BEAST MCC consensus trees sometimes have negative branchlengths)
# 3. Be careful of very short branches, as BioGeoBEARS will interpret ultrashort branches as direct ancestors
# 4. You can use non-ultrametric trees, but BioGeoBEARS will interpret any tips significantly below the
# top of the tree as fossils! This is only a good idea if you actually do have fossils in your tree,
# as in e.g. Wood, Matzke et al. (2013), Systematic Biology.
# 5. The default settings of BioGeoBEARS make sense for trees where the branchlengths are in units of
# millions of years, and the tree is 1-1000 units tall. If you have a tree with a total height of
# e.g. 0.00001, you will need to adjust e.g. the max values of d and e, or (simpler) multiply all
# your branchlengths to get them into reasonable units.
# 6. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
#######################################################
# This is the example Newick file for Hawaiian Psychotria
# (from Ree & Smith 2008)
# "trfn" = "tree file name"
trfn = np(paste(addslash(extdata_dir), "Psychotria_5.2.newick", sep=""))
# Look at the raw Newick file:
moref(trfn)
# Look at your phylogeny (plots to a PDF, which avoids issues with multiple graphics in same window):
pdffn = "tree.pdf"
pdf(file=pdffn, width=9, height=12)
tr = read.tree(trfn)
tr
plot(tr)
title("Example Psychotria phylogeny from Ree & Smith (2008)")
axisPhylo() # plots timescale
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# Geography file
# Notes:
# 1. This is a PHYLIP-formatted file. This means that in the
# first line,
# - the 1st number equals the number of rows (species)
# - the 2nd number equals the number of columns (number of areas)
# - after a tab, put the areas in parentheses, with spaces: (A B C D)
#
# 1.5. Example first line:
# 10 4 (A B C D)
#
# 2. The second line, and subsequent lines:
# speciesA 0110
# speciesB 0111
# speciesC 0001
# ...
#
# 2.5a. This means a TAB between the species name and the area 0/1s
# 2.5b. This also means NO SPACE AND NO TAB between the area 0/1s.
#
# 3. See example files at:
# http://phylo.wikidot.com/biogeobears#files
#
# 4. Make you understand what a PLAIN-TEXT EDITOR is:
# http://phylo.wikidot.com/biogeobears#texteditors
#
# 3. The PHYLIP format is the same format used for C++ LAGRANGE geography files.
#
# 4. All names in the geography file must match names in the phylogeny file.
#
# 5. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
#
# 6. Operational taxonomic units (OTUs) should ideally be phylogenetic lineages,
# i.e. genetically isolated populations. These may or may not be identical
# with species. You would NOT want to just use specimens, as each specimen
# automatically can only live in 1 area, which will typically favor DEC+J
# models. This is fine if the species/lineages really do live in single areas,
# but you wouldn't want to assume this without thinking about it at least.
# In summary, you should collapse multiple specimens into species/lineages if
# data indicates they are the same genetic population.
######################################################
# This is the example geography file for Hawaiian Psychotria
# (from Ree & Smith 2008)
geogfn = np(paste(addslash(extdata_dir), "Psychotria_geog.data", sep=""))
# Look at the raw geography text file:
moref(geogfn)
# Look at your geographic range data:
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
tipranges
# Maximum range size observed:
max(rowSums(dfnums_to_numeric(tipranges@df)))
# Set the maximum number of areas any species may occupy; this cannot be larger
# than the number of areas you set up, but it can be smaller.
max_range_size = 4
#######################################################
# START MANUAL MODIFICATION OF STATES LIST
# Manually modify the list of states (geographic ranges)
# to disallow some ranges that are disjunct
# (allow more complex scenarios than the area-adjacency file)
#######################################################
#######################################################
# NOTE! "areas" and "states/ranges" are DIFFERENT THINGS
# NOTHING WILL MAKE SENSE UNLESS you understand that a
# STATE/RANGE is made up of some number of areas.
#
# E.g.: 2 areas (A,B) equals
# 4 states/geographic ranges (null, A, B, AB)
#######################################################
# Get your states list (assuming, say, 4-area analysis, with max. rangesize=4)
areas = getareas_from_tipranges_object(tipranges)
#areas = c("A", "B", "C", "D")
# This is the list of states/ranges, where each state/range
# is a list of areas, counting from 0
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=TRUE)
# How many states/ranges, by default: 163
length(states_list_0based)
# Make the list of ranges
ranges_list = NULL
for (i in 1:length(states_list_0based))
{
if ( (length(states_list_0based[[i]]) == 1) && (is.na(states_list_0based[[i]])) )
{
tmprange = "_"
} else {
tmprange = paste(areas[states_list_0based[[i]]+1], collapse="")
}
ranges_list = c(ranges_list, tmprange)
}
# Look at the ranges list
ranges_list
# How many states/ranges, by default: 163
length(ranges_list)
# Here, we are assuming islands disappear back in time:
# 0.5 - H sinks (Hawaii Big Island)
# 1.9 - M sinks (Maui-Nui)
# 3.7 - O sinks (Oahu)
# 5.1 - K sinks (Kauai; although, we leave it up in the default Psychotria example; older islands are another setup)
# 10 - (a time older than the root of the tree, defining the oldest time-bin)
# Let's remove ranges disallowed by island disappearence:
# STRATUM 1 (youngest)
disallowed1 = c()
keepTF1 = ranges_list %in% disallowed1 == FALSE
ranges_list_NEW1 = ranges_list[keepTF1]
length(ranges_list_NEW1) # now 148
# STRATUM 2
ranges_to_lose = ranges_list_NEW1[grep(pattern="H", x=ranges_list)]
keepTF2 = ranges_list_NEW1 %in% ranges_to_lose == FALSE
ranges_list_NEW2 = ranges_list_NEW1[keepTF2]
ranges_list_NEW2
length(ranges_list_NEW2)
# STRATUM 3
ranges_to_lose = ranges_list_NEW2[grep(pattern="M", x=ranges_list_NEW2)]
keepTF3 = ranges_list_NEW2 %in% ranges_to_lose == FALSE
ranges_list_NEW3 = ranges_list_NEW2[keepTF3]
ranges_list_NEW3
length(ranges_list_NEW3)
# STRATUM 4 (oldest)
ranges_to_lose = ranges_list_NEW3[grep(pattern="O", x=ranges_list_NEW3)]
keepTF4 = ranges_list_NEW3 %in% ranges_to_lose == FALSE
ranges_list_NEW4 = ranges_list_NEW3[keepTF4]
ranges_list_NEW4
length(ranges_list_NEW4)
# Make the stratum-specific states list
states_list_0based_NEW1 = states_list_0based[keepTF1]
states_list_0based_NEW2 = states_list_0based_NEW1[keepTF2]
states_list_0based_NEW3 = states_list_0based_NEW2[keepTF3]
states_list_0based_NEW4 = states_list_0based_NEW3[keepTF4]
states_list_0based_NEW5 = states_list_0based_NEW4 # (copy of same, for the oldest time-bin)
# INPUT the NEW states list into the BioGeoBEARS_run_object, STRATIFIED
lists_of_states_lists_0based = list()
lists_of_states_lists_0based[[1]] = states_list_0based_NEW1
lists_of_states_lists_0based[[2]] = states_list_0based_NEW2
lists_of_states_lists_0based[[3]] = states_list_0based_NEW3
lists_of_states_lists_0based[[4]] = states_list_0based_NEW4
lists_of_states_lists_0based[[5]] = states_list_0based_NEW5
# Check it by eye (null range=NA; the islands KOMH are 0123 in 0-based numbering)
lists_of_states_lists_0based
#######################################################
# END MANUAL MODIFICATION OF THE STATES LIST
#######################################################
####################################################
####################################################
# KEY HINT: The number of states (= number of different possible geographic ranges)
# depends on (a) the number of areas and (b) max_range_size.
# If you have more than about 500-600 states, the calculations will get REALLY slow,
# since the program has to exponentiate a matrix of e.g. 600x600. Often the computer
# will just sit there and crunch, and never get through the calculation of the first
# likelihood.
#
# (this is also what is usually happening when LAGRANGE hangs: you have too many states!)
#
# To check the number of states for a given number of ranges, try:
numstates_from_numareas(numareas=4, maxareas=4, include_null_range=TRUE)
numstates_from_numareas(numareas=4, maxareas=4, include_null_range=FALSE)
numstates_from_numareas(numareas=4, maxareas=3, include_null_range=TRUE)
numstates_from_numareas(numareas=4, maxareas=2, include_null_range=TRUE)
# Large numbers of areas have problems:
numstates_from_numareas(numareas=10, maxareas=10, include_null_range=TRUE)
# ...unless you limit the max_range_size:
numstates_from_numareas(numareas=10, maxareas=2, include_null_range=TRUE)
####################################################
####################################################
#######################################################
#######################################################
# DEC AND DEC+J ANALYSIS
#######################################################
#######################################################
# NOTE: The BioGeoBEARS "DEC" model is identical with
# the Lagrange DEC model, and should return identical
# ML estimates of parameters, and the same
# log-likelihoods, for the same datasets.
#
# Ancestral state probabilities at nodes will be slightly
# different, since BioGeoBEARS is reporting the
# ancestral state probabilities under the global ML
# model, and Lagrange is reporting ancestral state
# probabilities after re-optimizing the likelihood
# after fixing the state at each node. These will
# be similar, but not identical. See Matzke (2014),
# Systematic Biology, for discussion.
#
# Also see Matzke (2014) for presentation of the
# DEC+J model.
#######################################################
#######################################################
#######################################################
#######################################################
#######################################################
# Run DEC
#######################################################
# Intitialize a default model (DEC model)
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
# Give BioGeoBEARS the location of the phylogeny Newick file
BioGeoBEARS_run_object$trfn = trfn
# Give BioGeoBEARS the location of the geography text file
BioGeoBEARS_run_object$geogfn = geogfn
# Input the maximum range size
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$min_branchlength = 0.000001 # Min to treat tip as a direct ancestor (no speciation event)
BioGeoBEARS_run_object$include_null_range = TRUE # set to FALSE for e.g. DEC* model, DEC*+J, etc.
# (For DEC* and other "*" models, please cite: Massana, Kathryn A.; Beaulieu,
# Jeremy M.; Matzke, Nicholas J.; O’Meara, Brian C. (2015). Non-null Effects of
# the Null Range in Biogeographic Models: Exploring Parameter Estimation in the
# DEC Model. bioRxiv, http://biorxiv.org/content/early/2015/09/16/026914 )
# Also: search script on "include_null_range" for other places to change
# Set up a time-stratified analysis:
# 1. Here, un-comment ONLY the files you want to use.
# 2. Also un-comment "BioGeoBEARS_run_object = section_the_tree(...", below.
# 3. For example files see (a) extdata_dir,
# or (b) http://phylo.wikidot.com/biogeobears#files
# and BioGeoBEARS Google Group posts for further hints)
#
# Uncomment files you wish to use in time-stratified analyses:
BioGeoBEARS_run_object$timesfn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/timeperiods.txt")
BioGeoBEARS_run_object$dispersal_multipliers_fn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/dispersal_multipliers.txt")
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$areas_adjacency_fn = "areas_adjacency.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.
# Speed options and multicore processing if desired
BioGeoBEARS_run_object$on_NaN_error = -1e50 # returns very low lnL if parameters produce NaN error (underflow check)
BioGeoBEARS_run_object$speedup = TRUE # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE # if FALSE, use optim() instead of optimx();
# if "GenSA", use Generalized Simulated Annealing, which seems better on high-dimensional
# problems (5+ parameters), but seems to sometimes fail to optimize on simple problems
BioGeoBEARS_run_object$num_cores_to_use = 1
# (use more cores to speed it up; this requires
# library(parallel) and/or library(snow). The package "parallel"
# is now default on Macs in R 3.0+, but apparently still
# has to be typed on some Windows machines. Note: apparently
# parallel works on Mac command-line R, but not R.app.
# BioGeoBEARS checks for this and resets to 1
# core with R.app)
# Sparse matrix exponentiation is an option for huge numbers of ranges/states (600+)
# I have experimented with sparse matrix exponentiation in EXPOKIT/rexpokit,
# but the results are imprecise and so I haven't explored it further.
# In a Bayesian analysis, it might work OK, but the ML point estimates are
# not identical.
# Also, I have not implemented all functions to work with force_sparse=TRUE.
# Volunteers are welcome to work on it!!
BioGeoBEARS_run_object$force_sparse = FALSE # force_sparse=TRUE causes pathology & isn't much faster at this scale
# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE, fossils_older_than=0.001, cut_fossils=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table
# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE # get ancestral states from optim run
# Set up DEC model
# (nothing to do; defaults)
# Look at the BioGeoBEARS_run_object; it's just a list of settings etc.
BioGeoBEARS_run_object
# This contains the model object
BioGeoBEARS_run_object$BioGeoBEARS_model_object
# This table contains the parameters of the model
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table
# Add the manually-constructed, time-stratified list of allowed states
BioGeoBEARS_run_object$lists_of_states_lists_0based = lists_of_states_lists_0based
# Run this to check inputs. Read the error messages if you get them!
BioGeoBEARS_run_object = fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object=BioGeoBEARS_run_object)
check_BioGeoBEARS_run(BioGeoBEARS_run_object)
# For a slow analysis, run once, then set runslow=FALSE to just
# load the saved result.
runslow = TRUE
resfn = "Psychotria_DEC_M3_time-stratified_v1.Rdata"
if (runslow)
{
res = bears_optim_run(BioGeoBEARS_run_object)
res
save(res, file=resfn)
resDEC = res
} else {
# Loads to "res"
load(resfn)
resDEC = res
}
#######################################################
# Run DEC+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$min_branchlength = 0.000001 # Min to treat tip as a direct ancestor (no speciation event)
BioGeoBEARS_run_object$include_null_range = TRUE # set to FALSE for e.g. DEC* model, DEC*+J, etc.
# (For DEC* and other "*" models, please cite: Massana, Kathryn A.; Beaulieu,
# Jeremy M.; Matzke, Nicholas J.; O’Meara, Brian C. (2015). Non-null Effects of
# the Null Range in Biogeographic Models: Exploring Parameter Estimation in the
# DEC Model. bioRxiv, http://biorxiv.org/content/early/2015/09/16/026914 )
# Also: search script on "include_null_range" for other places to change
# Set up a time-stratified analysis:
BioGeoBEARS_run_object$timesfn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/timeperiods.txt")
BioGeoBEARS_run_object$dispersal_multipliers_fn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/dispersal_multipliers.txt")
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$areas_adjacency_fn = "areas_adjacency.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.
# Speed options and multicore processing if desired
BioGeoBEARS_run_object$on_NaN_error = -1e50 # returns very low lnL if parameters produce NaN error (underflow check)
BioGeoBEARS_run_object$speedup = TRUE # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE # if FALSE, use optim() instead of optimx();
# if "GenSA", use Generalized Simulated Annealing, which seems better on high-dimensional
# problems (5+ parameters), but seems to sometimes fail to optimize on simple problems
BioGeoBEARS_run_object$num_cores_to_use = 1
BioGeoBEARS_run_object$force_sparse = FALSE # force_sparse=TRUE causes pathology & isn't much faster at this scale
# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE, fossils_older_than=0.001, cut_fossils=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table
# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE # get ancestral states from optim run
# Set up DEC+J model
# Get the ML parameter values from the 2-parameter nested model
# (this will ensure that the 3-parameter model always does at least as good)
dstart = resDEC$outputs@params_table["d","est"]
estart = resDEC$outputs@params_table["e","est"]
jstart = 0.0001
# Input starting values for d, e
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","init"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","est"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","init"] = estart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","est"] = estart
# Add j as a free parameter
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = jstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = jstart
# Add the manually-constructed, time-stratified list of allowed states
BioGeoBEARS_run_object$lists_of_states_lists_0based = lists_of_states_lists_0based
# Run this to check inputs. Read the error messages if you get them!
BioGeoBEARS_run_object = fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object=BioGeoBEARS_run_object)
check_BioGeoBEARS_run(BioGeoBEARS_run_object)
resfn = "Psychotria_DEC+J_M3_time-stratified_v1.Rdata"
runslow = TRUE
if (runslow)
{
#sourceall("/Dropbox/_njm/__packages/BioGeoBEARS_setup/")
res = bears_optim_run(BioGeoBEARS_run_object)
res
save(res, file=resfn)
resDECj = res
} else {
# Loads to "res"
load(resfn)
resDECj = res
}
#######################################################
# PDF plots
#######################################################
pdffn = "Psychotria_DEC_vs_DEC+J_M3_time-stratified_v1.pdf"
pdf(pdffn, height=6, width=6)
#######################################################
# Plot ancestral states - DEC
#######################################################
analysis_titletxt ="BioGeoBEARS DEC on Psychotria M3_time-stratified"
# Setup
results_object = resDEC
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
#######################################################
# Plot ancestral states - DECJ
#######################################################
analysis_titletxt ="BioGeoBEARS DEC+J on Psychotria M3_time-stratified"
# Setup
results_object = resDECj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
dev.off() # Turn off PDF
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr) # Plot it
#######################################################
#######################################################
# DIVALIKE AND DIVALIKE+J ANALYSIS
#######################################################
#######################################################
# NOTE: The BioGeoBEARS "DIVALIKE" model is not identical with
# Ronquist (1997)'s parsimony DIVA. It is a likelihood
# interpretation of DIVA, constructed by modelling DIVA's
# processes the way DEC does, but only allowing the
# processes DIVA allows (widespread vicariance: yes; subset
# sympatry: no; see Ronquist & Sanmartin 2011, Figure 4).
#
# DIVALIKE is a likelihood interpretation of parsimony
# DIVA, and it is "like DIVA" -- similar to, but not
# identical to, parsimony DIVA.
#
# I thus now call the model "DIVALIKE", and you should also. ;-)
#######################################################
#######################################################
#######################################################
# Run DIVALIKE
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$min_branchlength = 0.000001 # Min to treat tip as a direct ancestor (no speciation event)
BioGeoBEARS_run_object$include_null_range = TRUE # set to FALSE for e.g. DEC* model, DEC*+J, etc.
# (For DEC* and other "*" models, please cite: Massana, Kathryn A.; Beaulieu,
# Jeremy M.; Matzke, Nicholas J.; O’Meara, Brian C. (2015). Non-null Effects of
# the Null Range in Biogeographic Models: Exploring Parameter Estimation in the
# DEC Model. bioRxiv, http://biorxiv.org/content/early/2015/09/16/026914 )
# Also: search script on "include_null_range" for other places to change
# Set up a time-stratified analysis:
BioGeoBEARS_run_object$timesfn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/timeperiods.txt")
BioGeoBEARS_run_object$dispersal_multipliers_fn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/dispersal_multipliers.txt")
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$areas_adjacency_fn = "areas_adjacency.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.
# Speed options and multicore processing if desired
BioGeoBEARS_run_object$on_NaN_error = -1e50 # returns very low lnL if parameters produce NaN error (underflow check)
BioGeoBEARS_run_object$speedup = TRUE # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE # if FALSE, use optim() instead of optimx();
# if "GenSA", use Generalized Simulated Annealing, which seems better on high-dimensional
# problems (5+ parameters), but seems to sometimes fail to optimize on simple problems
BioGeoBEARS_run_object$num_cores_to_use = 1
BioGeoBEARS_run_object$force_sparse = FALSE # force_sparse=TRUE causes pathology & isn't much faster at this scale
# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE, fossils_older_than=0.001, cut_fossils=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table
# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE # get ancestral states from optim run
# Set up DIVALIKE model
# Remove subset-sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "2-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "ysv*1/2"
# Allow classic, widespread vicariance; all events equiprobable
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","init"] = 0.5
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","est"] = 0.5
# No jump dispersal/founder-event speciation
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01
# Add the manually-constructed, time-stratified list of allowed states
BioGeoBEARS_run_object$lists_of_states_lists_0based = lists_of_states_lists_0based
# Run this to check inputs. Read the error messages if you get them!
BioGeoBEARS_run_object = fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object=BioGeoBEARS_run_object)
check_BioGeoBEARS_run(BioGeoBEARS_run_object)
runslow = TRUE
resfn = "Psychotria_DIVALIKE_M3_time-stratified_v1.Rdata"
if (runslow)
{
res = bears_optim_run(BioGeoBEARS_run_object)
res
save(res, file=resfn)
resDIVALIKE = res
} else {
# Loads to "res"
load(resfn)
resDIVALIKE = res
}
#######################################################
# Run DIVALIKE+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$min_branchlength = 0.000001 # Min to treat tip as a direct ancestor (no speciation event)
BioGeoBEARS_run_object$include_null_range = TRUE # set to FALSE for e.g. DEC* model, DEC*+J, etc.
# (For DEC* and other "*" models, please cite: Massana, Kathryn A.; Beaulieu,
# Jeremy M.; Matzke, Nicholas J.; O’Meara, Brian C. (2015). Non-null Effects of
# the Null Range in Biogeographic Models: Exploring Parameter Estimation in the
# DEC Model. bioRxiv, http://biorxiv.org/content/early/2015/09/16/026914 )
# Also: search script on "include_null_range" for other places to change
# Set up a time-stratified analysis:
BioGeoBEARS_run_object$timesfn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/timeperiods.txt")
BioGeoBEARS_run_object$dispersal_multipliers_fn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/dispersal_multipliers.txt")
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$areas_adjacency_fn = "areas_adjacency.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.
# Speed options and multicore processing if desired
BioGeoBEARS_run_object$on_NaN_error = -1e50 # returns very low lnL if parameters produce NaN error (underflow check)
BioGeoBEARS_run_object$speedup = TRUE # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE # if FALSE, use optim() instead of optimx();
# if "GenSA", use Generalized Simulated Annealing, which seems better on high-dimensional
# problems (5+ parameters), but seems to sometimes fail to optimize on simple problems
BioGeoBEARS_run_object$num_cores_to_use = 1
BioGeoBEARS_run_object$force_sparse = FALSE # force_sparse=TRUE causes pathology & isn't much faster at this scale
# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE, fossils_older_than=0.001, cut_fossils=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table
# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE # get ancestral states from optim run
# Set up DIVALIKE+J model
# Get the ML parameter values from the 2-parameter nested model
# (this will ensure that the 3-parameter model always does at least as good)
dstart = resDIVALIKE$outputs@params_table["d","est"]
estart = resDIVALIKE$outputs@params_table["e","est"]
jstart = 0.0001
# Input starting values for d, e
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","init"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","est"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","init"] = estart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","est"] = estart
# Remove subset-sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "2-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "ysv*1/2"
# Allow classic, widespread vicariance; all events equiprobable
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","init"] = 0.5
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","est"] = 0.5
# Add jump dispersal/founder-event speciation
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = jstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = jstart
# Under DIVALIKE+J, the max of "j" should be 2, not 3 (as is default in DEC+J)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","min"] = 0.00001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","max"] = 1.99999
# Add the manually-constructed, time-stratified list of allowed states
BioGeoBEARS_run_object$lists_of_states_lists_0based = lists_of_states_lists_0based
# Run this to check inputs. Read the error messages if you get them!
BioGeoBEARS_run_object = fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object=BioGeoBEARS_run_object)
check_BioGeoBEARS_run(BioGeoBEARS_run_object)
resfn = "Psychotria_DIVALIKE+J_M3_time-stratified_v1.Rdata"
runslow = TRUE
if (runslow)
{
#sourceall("/Dropbox/_njm/__packages/BioGeoBEARS_setup/")
res = bears_optim_run(BioGeoBEARS_run_object)
res
save(res, file=resfn)
resDIVALIKEj = res
} else {
# Loads to "res"
load(resfn)
resDIVALIKEj = res
}
pdffn = "Psychotria_DIVALIKE_vs_DIVALIKE+J_M3_time-stratified_v1.pdf"
pdf(pdffn, height=6, width=6)
#######################################################
# Plot ancestral states - DIVALIKE
#######################################################
analysis_titletxt ="BioGeoBEARS DIVALIKE on Psychotria M3_time-stratified"
# Setup
results_object = resDIVALIKE
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
#######################################################
# Plot ancestral states - DIVALIKE+J
#######################################################
analysis_titletxt ="BioGeoBEARS DIVALIKE+J on Psychotria M3_time-stratified"
# Setup
results_object = resDIVALIKEj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)
#######################################################
#######################################################
# BAYAREALIKE AND BAYAREALIKE+J ANALYSIS
#######################################################
#######################################################
# NOTE: As with DIVA, the BioGeoBEARS BayArea-like model is
# not identical with the full Bayesian model implemented
# in the "BayArea" program of Landis et al. (2013).
#
# Instead, this is a simplified likelihood interpretation
# of the model. Basically, in BayArea and BioGeoBEARS-BAYAREALIKE,
# "d" and "e" work like they do in the DEC model of Lagrange
# (and BioGeoBEARS), and then BayArea's cladogenesis assumption
# (which is that nothing in particular happens at cladogenesis) is
# replicated by BioGeoBEARS.
#
# This leaves out 3 important things that are in BayArea:
# 1. Distance dependence (you can add this with a distances
# matrix + the "x" parameter in BioGeoBEARS, however)
# 2. A correction for disallowing "e" events that drive
# a species extinct (a null geographic range)
# 3. The neat Bayesian sampling of histories, which allows
# analyses on large numbers of areas.
#
# The main purpose of having a "BAYAREALIKE" model is
# to test the importance of the cladogenesis model on
# particular datasets. Does it help or hurt the data
# likelihood if there is no special cladogenesis process?
#
# BAYAREALIKE is a likelihood interpretation of BayArea,
# and it is "like BayArea" -- similar to, but not
# identical to, Bayesian BayArea.
# I thus now call the model "BAYAREALIKE", and you should also. ;-)
#######################################################
#######################################################
#######################################################
# Run BAYAREALIKE
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$min_branchlength = 0.000001 # Min to treat tip as a direct ancestor (no speciation event)
BioGeoBEARS_run_object$include_null_range = TRUE # set to FALSE for e.g. DEC* model, DEC*+J, etc.
# (For DEC* and other "*" models, please cite: Massana, Kathryn A.; Beaulieu,
# Jeremy M.; Matzke, Nicholas J.; O’Meara, Brian C. (2015). Non-null Effects of
# the Null Range in Biogeographic Models: Exploring Parameter Estimation in the
# DEC Model. bioRxiv, http://biorxiv.org/content/early/2015/09/16/026914 )
# Also: search script on "include_null_range" for other places to change
# Set up a time-stratified analysis:
BioGeoBEARS_run_object$timesfn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/timeperiods.txt")
BioGeoBEARS_run_object$dispersal_multipliers_fn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/dispersal_multipliers.txt")
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$areas_adjacency_fn = "areas_adjacency.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.
# Speed options and multicore processing if desired
BioGeoBEARS_run_object$on_NaN_error = -1e50 # returns very low lnL if parameters produce NaN error (underflow check)
BioGeoBEARS_run_object$speedup = TRUE # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE # if FALSE, use optim() instead of optimx();
# if "GenSA", use Generalized Simulated Annealing, which seems better on high-dimensional
# problems (5+ parameters), but seems to sometimes fail to optimize on simple problems
BioGeoBEARS_run_object$num_cores_to_use = 1
BioGeoBEARS_run_object$force_sparse = FALSE # force_sparse=TRUE causes pathology & isn't much faster at this scale
# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE, fossils_older_than=0.001, cut_fossils=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table
# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE # get ancestral states from optim run
# Set up BAYAREALIKE model
# No subset sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0
# No vicariance
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","est"] = 0.0
# No jump dispersal/founder-event speciation
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01
# Adjust linkage between parameters
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "1-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/1"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "1-j"
# Only sympatric/range-copying (y) events allowed, and with
# exact copying (both descendants always the same size as the ancestor)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","init"] = 0.9999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","est"] = 0.9999
# Add the manually-constructed, time-stratified list of allowed states
BioGeoBEARS_run_object$lists_of_states_lists_0based = lists_of_states_lists_0based
# Run this to check inputs. Read the error messages if you get them!
# Check the inputs; fixing any initial ("init") values outside min/max
BioGeoBEARS_run_object = fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object=BioGeoBEARS_run_object)
check_BioGeoBEARS_run(BioGeoBEARS_run_object)
runslow = TRUE
resfn = "Psychotria_BAYAREALIKE_M3_time-stratified_v1.Rdata"
if (runslow)
{
res = bears_optim_run(BioGeoBEARS_run_object)
res
save(res, file=resfn)
resBAYAREALIKE = res
} else {
# Loads to "res"
load(resfn)
resBAYAREALIKE = res
}
#######################################################
# Run BAYAREALIKE+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$min_branchlength = 0.000001 # Min to treat tip as a direct ancestor (no speciation event)
BioGeoBEARS_run_object$include_null_range = TRUE # set to FALSE for e.g. DEC* model, DEC*+J, etc.
# (For DEC* and other "*" models, please cite: Massana, Kathryn A.; Beaulieu,
# Jeremy M.; Matzke, Nicholas J.; O’Meara, Brian C. (2015). Non-null Effects of
# the Null Range in Biogeographic Models: Exploring Parameter Estimation in the
# DEC Model. bioRxiv, http://biorxiv.org/content/early/2015/09/16/026914 )
# Also: search script on "include_null_range" for other places to change
# Set up a time-stratified analysis:
BioGeoBEARS_run_object$timesfn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/timeperiods.txt")
BioGeoBEARS_run_object$dispersal_multipliers_fn = paste0(extdata_dir,"/examples/Psychotria_M3strat/BGB/dispersal_multipliers.txt")
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$areas_adjacency_fn = "areas_adjacency.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.
# Speed options and multicore processing if desired
BioGeoBEARS_run_object$on_NaN_error = -1e50 # returns very low lnL if parameters produce NaN error (underflow check)
BioGeoBEARS_run_object$speedup = TRUE # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = "GenSA"
BioGeoBEARS_run_object$num_cores_to_use = 1
BioGeoBEARS_run_object$force_sparse = FALSE # force_sparse=TRUE causes pathology & isn't much faster at this scale
# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE, fossils_older_than=0.001, cut_fossils=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table
# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE # get ancestral states from optim run
# Set up BAYAREALIKE+J model
# Get the ML parameter values from the 2-parameter nested model
# (this will ensure that the 3-parameter model always does at least as good)
dstart = resBAYAREALIKE$outputs@params_table["d","est"]
estart = resBAYAREALIKE$outputs@params_table["e","est"]
jstart = 0.0001
# Input starting values for d, e
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","init"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","est"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","init"] = estart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","est"] = estart
# No subset sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0
# No vicariance
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","est"] = 0.0
# *DO* allow jump dispersal/founder-event speciation (set the starting value close to 0)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = jstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = jstart
# Under BAYAREALIKE+J, the max of "j" should be 1, not 3 (as is default in DEC+J) or 2 (as in DIVALIKE+J)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","max"] = 0.99999
# Adjust linkage between parameters
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "1-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/1"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "1-j"
# Only sympatric/range-copying (y) events allowed, and with
# exact copying (both descendants always the same size as the ancestor)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","init"] = 0.9999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","est"] = 0.9999
# NOTE (NJM, 2014-04): BAYAREALIKE+J seems to crash on some computers, usually Windows
# machines. I can't replicate this on my Mac machines, but it is almost certainly
# just some precision under-run issue, when optim/optimx tries some parameter value
# just below zero. The "min" and "max" options on each parameter are supposed to
# prevent this, but apparently optim/optimx sometimes go slightly beyond
# these limits. Anyway, if you get a crash, try raising "min" and lowering "max"
# slightly for each parameter:
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","min"] = 0.0000001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","max"] = 4.9999999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","min"] = 0.0000001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","max"] = 4.9999999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","min"] = 0.00001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","max"] = 0.99999
# Add the manually-constructed, time-stratified list of allowed states
BioGeoBEARS_run_object$lists_of_states_lists_0based = lists_of_states_lists_0based
# Run this to check inputs. Read the error messages if you get them!
BioGeoBEARS_run_object = fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object=BioGeoBEARS_run_object)
check_BioGeoBEARS_run(BioGeoBEARS_run_object)
resfn = "Psychotria_BAYAREALIKE+J_M3_time-stratified_v1.Rdata"
runslow = TRUE
if (runslow)
{
res = bears_optim_run(BioGeoBEARS_run_object)
res
save(res, file=resfn)
resBAYAREALIKEj = res
} else {
# Loads to "res"
load(resfn)
resBAYAREALIKEj = res
}
pdffn = "Psychotria_BAYAREALIKE_vs_BAYAREALIKE+J_M3_time-stratified_v1.pdf"
pdf(pdffn, height=6, width=6)
#######################################################
# Plot ancestral states - BAYAREALIKE
#######################################################
analysis_titletxt ="BioGeoBEARS BAYAREALIKE on Psychotria M3_time-stratified"
# Setup
results_object = resBAYAREALIKE
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
#######################################################
# Plot ancestral states - BAYAREALIKE+J
#######################################################
analysis_titletxt ="BioGeoBEARS BAYAREALIKE+J on Psychotria M3_time-stratified"
# Setup
results_object = resBAYAREALIKEj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)
#########################################################################
#########################################################################
#########################################################################
#########################################################################
#
# CALCULATE SUMMARY STATISTICS TO COMPARE
# DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, BAYAREALIKE+J
#
#########################################################################
#########################################################################
#########################################################################
#########################################################################
#########################################################################
#########################################################################
# REQUIRED READING:
#
# Practical advice / notes / basic principles on statistical model
# comparison in general, and in BioGeoBEARS:
# http://phylo.wikidot.com/advice-on-statistical-model-comparison-in-biogeobears
#########################################################################
#########################################################################
# Set up empty tables to hold the statistical results
restable = NULL
teststable = NULL
#######################################################
# Statistics -- DEC vs. DEC+J
#######################################################
# We have to extract the log-likelihood differently, depending on the
# version of optim/optimx
LnL_2 = get_LnL_from_BioGeoBEARS_results_object(resDEC)
LnL_1 = get_LnL_from_BioGeoBEARS_results_object(resDECj)
numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats
# DEC, null model for Likelihood Ratio Test (LRT)
res2 = extract_params_from_BioGeoBEARS_results_object(results_object=resDEC, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
# DEC+J, alternative model for Likelihood Ratio Test (LRT)
res1 = extract_params_from_BioGeoBEARS_results_object(results_object=resDECj, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
# The null hypothesis for a Likelihood Ratio Test (LRT) is that two models
# confer the same likelihood on the data. See: Brian O'Meara's webpage:
# http://www.brianomeara.info/tutorials/aic
# ...for an intro to LRT, AIC, and AICc
rbind(res2, res1)
tmp_tests = conditional_format_table(stats)
restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)
#######################################################
# Statistics -- DIVALIKE vs. DIVALIKE+J
#######################################################
# We have to extract the log-likelihood differently, depending on the
# version of optim/optimx
LnL_2 = get_LnL_from_BioGeoBEARS_results_object(resDIVALIKE)
LnL_1 = get_LnL_from_BioGeoBEARS_results_object(resDIVALIKEj)
numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats
# DIVALIKE, null model for Likelihood Ratio Test (LRT)
res2 = extract_params_from_BioGeoBEARS_results_object(results_object=resDIVALIKE, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
# DIVALIKE+J, alternative model for Likelihood Ratio Test (LRT)
res1 = extract_params_from_BioGeoBEARS_results_object(results_object=resDIVALIKEj, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
rbind(res2, res1)
conditional_format_table(stats)
tmp_tests = conditional_format_table(stats)
restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)
#######################################################
# Statistics -- BAYAREALIKE vs. BAYAREALIKE+J
#######################################################
# We have to extract the log-likelihood differently, depending on the
# version of optim/optimx
LnL_2 = get_LnL_from_BioGeoBEARS_results_object(resBAYAREALIKE)
LnL_1 = get_LnL_from_BioGeoBEARS_results_object(resBAYAREALIKEj)
numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats
# BAYAREALIKE, null model for Likelihood Ratio Test (LRT)
res2 = extract_params_from_BioGeoBEARS_results_object(results_object=resBAYAREALIKE, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
# BAYAREALIKE+J, alternative model for Likelihood Ratio Test (LRT)
res1 = extract_params_from_BioGeoBEARS_results_object(results_object=resBAYAREALIKEj, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
rbind(res2, res1)
conditional_format_table(stats)
tmp_tests = conditional_format_table(stats)
restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)
#########################################################################
# ASSEMBLE RESULTS TABLES: DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, BAYAREALIKE+J
#########################################################################
teststable$alt = c("DEC+J", "DIVALIKE+J", "BAYAREALIKE+J")
teststable$null = c("DEC", "DIVALIKE", "BAYAREALIKE")
row.names(restable) = c("DEC", "DEC+J", "DIVALIKE", "DIVALIKE+J", "BAYAREALIKE", "BAYAREALIKE+J")
restable = put_jcol_after_ecol(restable)
restable
# Look at the results!!
restable
teststable
#######################################################
# Save the results tables for later -- check for e.g.
# convergence issues
#######################################################
# Loads to "restable"
save(restable, file="restable_v1.Rdata")
load(file="restable_v1.Rdata")
# Loads to "teststable"
save(teststable, file="teststable_v1.Rdata")
load(file="teststable_v1.Rdata")
# Also save to text files
write.table(restable, file="restable.txt", quote=FALSE, sep="\t")
write.table(unlist_df(teststable), file="teststable.txt", quote=FALSE, sep="\t")
#######################################################
# Model weights of all six models
#######################################################
restable2 = restable
# With AICs:
AICtable = calc_AIC_column(LnL_vals=restable$LnL, nparam_vals=restable$numparams)
restable = cbind(restable, AICtable)
restable_AIC_rellike = AkaikeWeights_on_summary_table(restable=restable, colname_to_use="AIC")
restable_AIC_rellike = put_jcol_after_ecol(restable_AIC_rellike)
restable_AIC_rellike
# With AICcs -- factors in sample size
samplesize = length(tr$tip.label)
AICtable = calc_AICc_column(LnL_vals=restable$LnL, nparam_vals=restable$numparams, samplesize=samplesize)
restable2 = cbind(restable2, AICtable)
restable_AICc_rellike = AkaikeWeights_on_summary_table(restable=restable2, colname_to_use="AICc")
restable_AICc_rellike = put_jcol_after_ecol(restable_AICc_rellike)
restable_AICc_rellike
# Also save to text files
write.table(restable_AIC_rellike, file="restable_AIC_rellike.txt", quote=FALSE, sep="\t")
write.table(restable_AICc_rellike, file="restable_AICc_rellike.txt", quote=FALSE, sep="\t")
# Save with nice conditional formatting
write.table(conditional_format_table(restable_AIC_rellike), file="restable_AIC_rellike_formatted.txt", quote=FALSE, sep="\t")
write.table(conditional_format_table(restable_AICc_rellike), file="restable_AICc_rellike_formatted.txt", quote=FALSE, sep="\t")
#######################################################
# Your results should be:
#######################################################
# > restable
#
# LnL numparams d e j AIC
# DEC -41.71374 2 0.07445543 0.07183810 0.0000000 87.42749
# DEC+J -32.78173 3 0.02486813 0.04091879 0.1980793 71.56346
# DIVALIKE -42.07915 2 0.08494066 0.06236922 0.0000000 88.15831
# DIVALIKE+J -32.64996 3 0.02649726 0.04057528 0.1858708 71.29992
# BAYAREALIKE -48.69602 2 0.09411729 0.24572146 0.0000000 101.39205
# BAYAREALIKE+J -32.88534 3 0.02360782 0.04062444 0.1776749 71.77069
#######################################################
# The p-value of the LRT (Likelihood Ratio Test) tells you whether or not you can reject the
# null hypothesis that DEC and DEC+J confer equal likelihoods on the data
#
# AIC and AICc model weights are also shown, giving a sense of the relative probability of the two models.
#
# (One could easily do model weights between all 6 models, but this is not done here; s
# see Brian O'Meara's AIC webpage: http://www.brianomeara.info/tutorials/aic )
#######################################################
# > teststable
#
# alt null LnLalt LnLnull DFalt DFnull DF Dstatistic pval test tail AIC1
# 1 DEC+J DEC -32.78 -41.71 3 2 1 17.86 2.4e-05 chi-squared one-tailed 71.56
# 11 DIVALIKE+J DIVALIKE -32.65 -42.08 3 2 1 18.86 1.4e-05 chi-squared one-tailed 71.3
# 12 BAYAREALIKE+J BAYAREALIKE -32.89 -48.7 3 2 1 31.62 1.9e-08 chi-squared one-tailed 71.77
# AIC2 AICwt1 AICwt2 AICweight_ratio_model1 AICweight_ratio_model2
# 1 87.43 1.00 0.0004 2785 0.0004
# 11 88.16 1.00 0.0002 4579 0.0002
# 12 101.4 1.00 3.7e-07 2705177 3.7e-07
#######################################################
# Time-stratified Biogeographic Stochastic Mapping (BSM)
#######################################################
model_name = "DEC_M3_timestrat"
res = resDEC
pdffn = paste0("Psychotria_", model_name, "_v1.pdf")
pdf(pdffn, height=6, width=6)
analysis_titletxt = paste0(model_name, " on Psychotria")
# Setup
results_object = res
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)
dev.off() # Turn off PDF
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr) # Plot it
#######################################################
# Stochastic mapping on DEC M3b stratified with islands coming up
#######################################################
clado_events_tables = NULL
ana_events_tables = NULL
lnum = 0
#######################################################
# Get the inputs for Biogeographical Stochastic Mapping
# Note: this can be slow for large state spaces and trees, since
# the independent likelihoods for each branch are being pre-calculated
# E.g., for 10 areas, this requires calculation of a 1024x1024 matrix
# for each branch. On a tree with ~800 tips and thus ~1600 branches, this was about 1.6 gigs
# for storage of "BSM_inputs_file.Rdata".
# Update: 2015-09-23 -- now, if you used multicore functionality for the ML analysis,
# the same settings will be used for get_inputs_for_stochastic_mapping().
#######################################################
BSM_inputs_fn = "BSM_inputs_file.Rdata"
runInputsSlow = TRUE
if (runInputsSlow)
{
# debug:
# cluster_already_open=FALSE; rootedge=FALSE; statenum_bottom_root_branch_1based=NULL; printlevel=1; min_branchlength=0.000001
stochastic_mapping_inputs_list = get_inputs_for_stochastic_mapping(res=res)
save(stochastic_mapping_inputs_list, file=BSM_inputs_fn)
} else {
# Loads to "stochastic_mapping_inputs_list"
load(BSM_inputs_fn)
} # END if (runInputsSlow)
# Check inputs (doesn't work the same on unconstr)
names(stochastic_mapping_inputs_list)
stochastic_mapping_inputs_list$phy2
stochastic_mapping_inputs_list$COO_weights_columnar
stochastic_mapping_inputs_list$unconstr
set.seed(seed=as.numeric(Sys.time()))
runBSMslow = TRUE
if (runBSMslow == TRUE)
{
# Saves to: RES_clado_events_tables.Rdata
# Saves to: RES_ana_events_tables.Rdata
# Bug check:
# stochastic_mapping_inputs_list=stochastic_mapping_inputs_list; maxnum_maps_to_try=100; nummaps_goal=50; maxtries_per_branch=40000; save_after_every_try=TRUE; savedir=getwd(); seedval=12345; wait_before_save=0.01; master_nodenum_toPrint=0
BSM_output = runBSM(res, stochastic_mapping_inputs_list=stochastic_mapping_inputs_list, maxnum_maps_to_try=100, nummaps_goal=50, maxtries_per_branch=40000, save_after_every_try=TRUE, savedir=getwd(), seedval=12345, wait_before_save=0.01, master_nodenum_toPrint=0)
RES_clado_events_tables = BSM_output$RES_clado_events_tables
RES_ana_events_tables = BSM_output$RES_ana_events_tables
} else {
# Load previously saved...
# Loads to: RES_clado_events_tables
load(file="RES_clado_events_tables.Rdata")
# Loads to: RES_ana_events_tables
load(file="RES_ana_events_tables.Rdata")
BSM_output = NULL
BSM_output$RES_clado_events_tables = RES_clado_events_tables
BSM_output$RES_ana_events_tables = RES_ana_events_tables
} # END if (runBSMslow == TRUE)
# Extract BSM output
clado_events_tables = BSM_output$RES_clado_events_tables
ana_events_tables = BSM_output$RES_ana_events_tables
head(clado_events_tables[[1]])
head(ana_events_tables[[1]])
length(clado_events_tables)
length(ana_events_tables)
include_null_range = TRUE
areanames = names(tipranges@df)
areas = areanames
max_range_size = 4
# Note: If you did something to change the states_list from the default given the number of areas, you would
# have to manually make that change here as well! (e.g., areas_allowed matrix, or manual reduction of the states_list)
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=include_null_range)
colors_list_for_states = get_colors_for_states_list_0based(areanames=areanames, states_list_0based=states_list_0based, max_range_size=max_range_size, plot_null_range=TRUE)
############################################
# Setup for painting a single stochastic map
############################################
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
stratified = TRUE
clado_events_table = clado_events_tables[[1]]
ana_events_table = ana_events_tables[[1]]
# cols_to_get = names(clado_events_table[,-ncol(clado_events_table)])
# colnums = match(cols_to_get, names(ana_events_table))
# ana_events_table_cols_to_add = ana_events_table[,colnums]
# anagenetic_events_txt_below_node = rep("none", nrow(ana_events_table_cols_to_add))
# ana_events_table_cols_to_add = cbind(ana_events_table_cols_to_add, anagenetic_events_txt_below_node)
# rows_to_get_TF = ana_events_table_cols_to_add$node <= length(tr$tip.label)
# master_table_cladogenetic_events = rbind(ana_events_table_cols_to_add[rows_to_get_TF,], clado_events_table)
############################################
# Open a PDF
############################################
pdffn = paste0(model_name, "_single_stochastic_map_n1.pdf")
pdf(file=pdffn, height=6, width=6)
# Convert the BSM into a modified res object
master_table_cladogenetic_events = clado_events_tables[[1]]
resmod = stochastic_map_states_into_res(res=res, master_table_cladogenetic_events=master_table_cladogenetic_events, stratified=stratified)
plot_BioGeoBEARS_results(results_object=resmod, analysis_titletxt="Stochastic map", addl_params=list("j"), label.offset=0.5, plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, skiptree=FALSE, show.tip.label=TRUE)
# Paint on the branch states
paint_stochastic_map_branches(res=resmod, master_table_cladogenetic_events=master_table_cladogenetic_events, colors_list_for_states=colors_list_for_states, lwd=5, lty=par("lty"), root.edge=TRUE, stratified=stratified)
plot_BioGeoBEARS_results(results_object=resmod, analysis_titletxt="Stochastic map", addl_params=list("j"), plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, skiptree=TRUE, show.tip.label=TRUE)
############################################
# Close PDF
############################################
dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)
#######################################################
# Plot all 50 stochastic maps to PDF
#######################################################
# Setup
include_null_range = include_null_range
areanames = areanames
areas = areanames
max_range_size = max_range_size
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=include_null_range)
colors_list_for_states = get_colors_for_states_list_0based(areanames=areanames, states_list_0based=states_list_0based, max_range_size=max_range_size, plot_null_range=TRUE)
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
stratified = stratified
# Loop through the maps and plot to PDF
pdffn = paste0(model_name, "_", length(clado_events_tables), "BSMs_v1.pdf")
pdf(file=pdffn, height=6, width=6)
nummaps_goal = 50
for (i in 1:nummaps_goal)
{
clado_events_table = clado_events_tables[[i]]
analysis_titletxt = paste0(model_name, " - Stochastic Map #", i, "/", nummaps_goal)
plot_BSM(results_object=res, clado_events_table=clado_events_table, stratified=stratified, analysis_titletxt=analysis_titletxt, addl_params=list("j"), label.offset=0.5, plotwhat="text", cornercoords_loc=scriptdir, root.edge=TRUE, colors_list_for_states=colors_list_for_states, show.tip.label=TRUE, include_null_range=include_null_range)
} # END for (i in 1:nummaps_goal)
dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)
#######################################################
# Summarize stochastic map tables
#######################################################
length(clado_events_tables)
length(ana_events_tables)
head(clado_events_tables[[1]][,-20])
tail(clado_events_tables[[1]][,-20])
head(ana_events_tables[[1]])
tail(ana_events_tables[[1]])
areanames = names(tipranges@df)
actual_names = areanames
actual_names
# Get the dmat and times (if any)
dmat_times = get_dmat_times_from_res(res=res, numstates=NULL)
dmat_times
# Extract BSM output
clado_events_tables = BSM_output$RES_clado_events_tables
ana_events_tables = BSM_output$RES_ana_events_tables
# Simulate the source areas
BSMs_w_sourceAreas = simulate_source_areas_ana_clado(res, clado_events_tables, ana_events_tables, areanames)
clado_events_tables = BSMs_w_sourceAreas$clado_events_tables
ana_events_tables = BSMs_w_sourceAreas$ana_events_tables
# Count all anagenetic and cladogenetic events
counts_list = count_ana_clado_events(clado_events_tables, ana_events_tables, areanames, actual_names)
summary_counts_BSMs = counts_list$summary_counts_BSMs
print(conditional_format_table(summary_counts_BSMs))
# Histogram of event counts
hist_event_counts(counts_list, pdffn=paste0(model_name, "_histograms_of_event_counts.pdf"))
#######################################################
# Print counts to files
#######################################################
tmpnames = names(counts_list)
cat("\n\nWriting tables* of counts to tab-delimited text files:\n(* = Tables have dimension=2 (rows and columns). Cubes (dimension 3) and lists (dimension 1) will not be printed to text files.) \n\n")
for (i in 1:length(tmpnames))
{
cmdtxt = paste0("item = counts_list$", tmpnames[i])
eval(parse(text=cmdtxt))
# Skip cubes
if (length(dim(item)) != 2)
{
next()
}
outfn = paste0(tmpnames[i], ".txt")
if (length(item) == 0)
{
cat(outfn, " -- NOT written, *NO* events recorded of this type", sep="")
cat("\n")
} else {
cat(outfn)
cat("\n")
write.table(conditional_format_table(item), file=outfn, quote=FALSE, sep="\t", col.names=TRUE, row.names=TRUE)
} # END if (length(item) == 0)
} # END for (i in 1:length(tmpnames))
cat("...done.\n")
#######################################################
# Check that ML ancestral state/range probabilities and
# the mean of the BSMs approximately line up
#######################################################
library(MultinomialCI) # For 95% CIs on BSM counts
check_ML_vs_BSM(res, clado_events_tables, model_name, tr=NULL, plot_each_node=FALSE, linreg_plot=TRUE, MultinomialCI=TRUE)
Interpreting the outputs of Biogeographic Stochastic Mapping
Interpreting the output of Biogeographic Stochastic Mapping. I had thought that the BSM script was fairly self-explanatory, but based on the questions I have gotten, it seems that many users are overwhelmed by the amount of stuff that prints to the screen. Here is the key: the important stuff for the user is at the very end of the script.
Here is the screen output that you get if you run the above script through to completion. Notice that the key function, doing all of the summary work for you, is count_ana_clado_events():
> clado_events_tables = BSMs_w_sourceAreas$clado_events_tables
> ana_events_tables = BSMs_w_sourceAreas$ana_events_tables
>
> # Count all anagenetic and cladogenetic events
> counts_list = count_ana_clado_events(clado_events_tables, ana_events_tables, areanames, actual_names)
count_ana_clado_events() is counting events over Biogeographical Stochastic Map (BSM) #:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Range-switching dispersal (all observed 'a' dispersals):
means:
K O M H
K 0 0 0 0
O 0 0 0 0
M 0 0 0 0
H 0 0 0 0
standard deviations:
K O M H
K 0 0 0 0
O 0 0 0 0
M 0 0 0 0
H 0 0 0 0
Range-expansion dispersal (all observed 'd' dispersals):
means:
K O M H
K 0 0.12 0.50 0.48
O 0 0 0.54 0.52
M 0 0 0 0
H 0 0.020 0 0
standard deviations:
K O M H
K 0 0.39 0.51 0.50
O 0 0 0.50 0.50
M 0 0 0 0
H 0 0.14 0 0
Anagenetic dispersal (mean of all observed anagenetic 'a' or 'd' dispersals):
means:
K O M H
K 0 0.12 0.50 0.48
O 0 0 0.54 0.52
M 0 0 0 0
H 0 0.020 0 0
standard deviations:
K O M H
K 0 0.39 0.51 0.50
O 0 0 0.50 0.50
M 0 0 0 0
H 0 0.14 0 0
Cladogenetic dispersal (mean of all observed jump 'j' dispersals):
means:
K O M H
K 0 3.06 0 0
O 0.26 0 1.98 0
M 0 0 0 0
H 0 0 0 0
standard deviations:
K O M H
K 0 0.47 0 0
O 0.53 0 0.14 0
M 0 0 0 0
H 0 0 0 0
ALL dispersal (mean of all observed anagenetic 'a', 'd' dispersals, PLUS cladogenetic founder/jump dispersal):
means:
K O M H
K 0 3.18 0.50 0.48
O 0.26 0 2.52 0.52
M 0 0 0 0
H 0 0.020 0 0
standard deviations:
K O M H
K 0 0.39 0.51 0.50
O 0.53 0 0.50 0.50
M 0 0 0 0
H 0 0.14 0 0
Summary of event counts from 50 BSMs:
founder a d e subset vicariance sympatry ALL_disp ana_disp all_ana all_clado
means 5.3 0 2.18 0 0.080 0.080 12.54 7.48 2.18 2.18 18
stdevs 0.61 0 0.44 0 0.34 0.27 0.65 0.61 0.44 0.44 0
sums 265 0 109 0 4 4 627 374 109 109 900
total_events
means 20.18
stdevs 0.44
sums 1009
A series of dispersal count matrices are printed to screen. In the dispersal count matrices, the row headings represent "from" (source areas), and the column headings represent "to" (destination areas).
The mean is the average number of dispersal events of each type, across 50 stochastic maps (or however many BSMs you ran). The second matrix is the standard deviation of these counts. Multiply the standard deviation by +/-1.96 to get the theoretical 95% confidence interval (if the histograms are normal distributions; if non-normal, doing a Bayesian credibility interval, using percentiles on the actual list of counts, would be better).
The first pair of matrices is for range-switching dispersal. Range-switching dispersal is controlled by the parameter 'a', which is fixed to 0 in DEC and related models, so all of these counts are zero.
The second pair of matrices is for for range-expansion dispersal. Range-switching dispersal is controlled by the parameter 'd', which is freely estimated in DEC and related models. If the d parameter is 0, these counts are zero, but otherwise there should be a few events. (In this particular DEC+J model on the time-stratified Hawaiian islands, there are only a few 'd' events, as founder-event dispersal dominates.)
The third pair of matrices adds up both types of anagenetic dispersal. Since 'a' was zero, this equals the 'd' matrices.
The fourth pair of matrices is for for founder-event/jump dispersal (which is cladogenetic). Founder-event/jump dispersal is controlled by the parameter 'j', which is freely estimated in DEC+J and related models. If the j parameter is 0, these counts are zero, but otherwise there should be a some j events. (In this particular DEC+J model on the time-stratified Hawaiian islands, j is quite a strong explainer of the data, as founder-event dispersal dominates.)
The fifth pair of matrices adds up ALL types of dispersal, both anagenetic and cladogenetic.
The last item printed to screen is "Summary of event counts from 50 BSMs". This contains the means and standard deviations of all event types. Readers will note that column "all_clado" (the sum of all cladogenetic events) has a mean of 18 events, with a standard deviation of 0. This is because there are 18 nodes in the Psychotria phylogeny, thus there are always exactly 18 cladogenetic events in every single stochastic map. This means, of course, that sympatric range-copying (e.g. K->K,K) is being counted as an event.
The row "sums" adds up all of the events across the 50 stochastic maps. Dividing the sums by 50 produces the row "means".
Accessing all of the summary tables
Accessing all of the summary tables. All of these tables, as well as the raw counts per stochastic map, are stored in the object counts_list. You can see all of these items with names(counts_list). You can access individual elements with e.g. counts_list$founder_totals_list.
> names(counts_list)
[1] "all_dispersals_counts_cube" "anagenetic_dispersals_counts_cube"
[3] "founder_counts_cube" "ana_counts_cube"
[5] "a_counts_cube" "d_counts_cube"
[7] "e_counts_rectangle" "unique_sub_counts"
[9] "unique_vic_counts" "unique_sym_counts"
[11] "all_dispersals_totals_list" "anagenetic_dispersals_totals_list"
[13] "founder_totals_list" "ana_totals_list"
[15] "a_totals_list" "d_totals_list"
[17] "e_totals_list" "subsetSymp_totals_list"
[19] "vicariance_totals_list" "sympatry_totals_list"
[21] "clado_totals_list" "all_totals_list"
[23] "all_dispersals_counts_fromto_means" "ana_dispersals_counts_fromto_means"
[25] "founder_counts_fromto_means" "a_counts_fromto_means"
[27] "d_counts_fromto_means" "all_dispersals_counts_fromto_sds"
[29] "ana_dispersals_counts_fromto_sds" "founder_counts_fromto_sds"
[31] "a_counts_fromto_sds" "d_counts_fromto_sds"
[33] "summary_counts_BSMs"
> counts_list$founder_totals_list
[1] 5 6 6 5 4 4 5 4 5 5 6 5 6 6 5 5 5 6 6 6 6 5 5 5 5 5 5 5 5 6 5 6 6 5 5 5 6 5 5 6 5 6 5 5 5 5
[47] 6 5 7 5
Accessing the tables storing all of the actual stochastic maps
Accessing the tables storing all of the actual stochastic maps. If you run the script all of the way through to the end, the actual stochastic maps (exact event histories) are saved in these two lists of tables: clado_events_tables, and ana_events_tables.
The object clado_events_tables is a list of 50 tables, containing the cladogenetic range-inheritance events at each nodes. The object ana_events_tables is a list of 50 tables, containing the anagenetic events, on branches where anagenetic events occurred.
(Note that it is perfectly possible to have a stochastic map where no anagenetic events occur, particularly in smaller datasets and those with a strong j weight parameter.)
Readings lists and tables in R
Accessing these tables requires some basic R skillz. I cannot teach these to everyone individually. If you are totally at sea with R, you should probably learn basic R before attempting to manipulate tables.
But, just you get you started, this is how you would access the third table in the list of 50 cladogenetic events tables: clado_events_tables[[3]]. This could be a huge table, so it could be useful to just look at the top or bottom of the table, with head() or tail().
# First few rows of the third clado_events_table:
> head(clado_events_tables[[3]])
node node.type parent_br edge.length ancestor daughter_nds time_bp fossils
110 1 tip 8 0.965685 27 0.0008898 FALSE
21 2 tip 9 0.965685 27 0.0008898 FALSE
31 3 tip 11 1.230219 28 0.0008898 FALSE
41 4 tip 12 1.230219 28 0.0008898 FALSE
51 5 tip 14 1.851022 29 0.0000000 FALSE
61 6 tip 15 1.851022 29 0.0000000 FALSE
label stratum time_top time_bot reltimept piecenum piececlass SUBnode
110 P_hawaiiensis_WaikamoiL1 1 0 0.5 0.5 1 subbranch 1
21 P_mauiensis_Eke 1 0 0.5 0.5 2 subbranch 2
31 P_fauriei2 1 0 0.5 0.5 3 subbranch 3
41 P_hathewayi_1 1 0 0.5 0.5 4 subbranch 4
51 P_kaduana_PuuKukuiAS 1 0 0.5 0.5 5 subbranch 5
61 P_mauiensis_PepeAS 1 0 0.5 0.5 6 subbranch 6
SUBnode.type SUBparent_br SUBedge.length SUBancestor SUBdaughter_nds SUBtime_bp SUBfossils
110 tip 8 0.965685 27 0.0008898 FALSE
21 tip 9 0.965685 27 0.0008898 FALSE
31 tip 11 1.230219 28 0.0008898 FALSE
41 tip 12 1.230219 28 0.0008898 FALSE
51 tip 14 1.851022 29 0.0000000 FALSE
61 tip 15 1.851022 29 0.0000000 FALSE
SUBlabel clado_event_type clado_event_txt clado_dispersal_from
110 P_hawaiiensis_WaikamoiL1 <NA> <NA>
21 P_mauiensis_Eke <NA> <NA>
31 P_fauriei2 <NA> <NA>
41 P_hathewayi_1 <NA> <NA>
51 P_kaduana_PuuKukuiAS <NA> <NA>
61 P_mauiensis_PepeAS <NA> <NA>
clado_dispersal_to sampled_states_AT_nodes sampled_states_AT_brbots left_desc_nodes
110 <NA> 4 4 NA
21 <NA> 4 4 NA
31 <NA> 3 3 NA
41 <NA> 3 3 NA
51 <NA> 4 4 NA
61 <NA> 4 4 NA
right_desc_nodes samp_LEFT_dcorner samp_RIGHT_dcorner anagenetic_events_txt_below_node
110 NA NA NA none
21 NA NA NA none
31 NA NA NA none
41 NA NA NA none
51 NA NA NA none
61 NA NA NA none
# Last few rows of the third clado_events_table:
> tail(clado_events_tables[[3]])
node node.type parent_br edge.length ancestor daughter_nds time_bp fossils label stratum
211 21 internal 1 0.2873120 20 22, 35 4.912688 NA inNode21 4
221 22 internal 2 0.7345186 21 23, 31 4.178169 NA inNode22 4
361 36 internal 32 2.3720812 20 37, 19 2.827919 NA inNode36 4
212 21 internal 1 0.2873120 20 22, 35 4.912688 NA inNode21 5
362 36 internal 32 2.3720812 20 37, 19 2.827919 NA inNode36 5
20 20 root NA NA NA 21, 36 5.200000 NA inNode20 5
time_top time_bot reltimept piecenum piececlass SUBnode SUBnode.type SUBparent_br
211 3.7 5.1 1.4 1 subtree 4 root NA
221 3.7 5.1 1.4 1 subtree 5 internal 1
361 3.7 5.1 1.4 2 subbranch 4 tip 6
212 5.1 10.0 4.9 1 subtree 1 tip 1
362 5.1 10.0 4.9 1 subtree 2 tip 2
20 5.1 10.0 4.9 1 subtree 3 root NA
SUBedge.length SUBancestor SUBdaughter_nds SUBtime_bp SUBfossils
211 0.1873120 NA 5, 3 1.2126880 NA
221 0.7345186 4 1, 2 0.4781694 NA
361 1.5000000 5 0.0000000 FALSE
212 0.1000000 3 0.0000000 FALSE
362 0.1000000 3 0.0000000 FALSE
20 NA NA 1, 2 0.1000000 NA
SUBlabel
211 inNode4
221 inNode5
361 P_hexandra_K1,P_hexandra_M,P_hexandra_Oahu
212 P_fauriei2,P_hathewayi_1,P_hawaiiensis_WaikamoiL1,P_kaduana_PuuKukuiAS,P_mauiensis_Eke,P_mauiensis_PepeAS,P_greenwelliae07,P_greenwelliae907,P_kaduana_HawaiiLoa,P_grandiflora_Kal2,P_hobdyi_Kuia,P_hawaiiensis_Makaopuhi,P_mariniana_Kokee2,P_mariniana_MauiNui,P_mariniana_Oahu,P_wawraeDL7428
362 P_hexandra_K1,P_hexandra_M,P_hexandra_Oahu
20 inNode3
clado_event_type clado_event_txt clado_dispersal_from clado_dispersal_to
211 sympatry (y) K->K,K
221 sympatry (y) K->K,K
361 <NA> <NA> <NA>
212
362
20 sympatry (y) K->K,K
sampled_states_AT_nodes sampled_states_AT_brbots left_desc_nodes right_desc_nodes
211 2 2 5 3
221 2 2 1 2
361 2 2 NA NA
212 2 2 NA NA
362 2 2 NA NA
20 2 NA 1 2
samp_LEFT_dcorner samp_RIGHT_dcorner anagenetic_events_txt_below_node
211 2 2 none
221 2 2 none
361 NA NA none
212 NA NA none
362 NA NA none
20 2 2 <NA>
# Names of the columns of one of the clado_events_table:
> names(clado_events_tables[[15]])
[1] "node" "node.type"
[3] "parent_br" "edge.length"
[5] "ancestor" "daughter_nds"
[7] "time_bp" "fossils"
[9] "label" "stratum"
[11] "time_top" "time_bot"
[13] "reltimept" "piecenum"
[15] "piececlass" "SUBnode"
[17] "SUBnode.type" "SUBparent_br"
[19] "SUBedge.length" "SUBancestor"
[21] "SUBdaughter_nds" "SUBtime_bp"
[23] "SUBfossils" "SUBlabel"
[25] "clado_event_type" "clado_event_txt"
[27] "clado_dispersal_from" "clado_dispersal_to"
[29] "sampled_states_AT_nodes" "sampled_states_AT_brbots"
[31] "left_desc_nodes" "right_desc_nodes"
[33] "samp_LEFT_dcorner" "samp_RIGHT_dcorner"
[35] "anagenetic_events_txt_below_node"
# Find out what cladogenetic event happened at the root node (node 20) of the 23rd cladogenetic events table:
# Which row has node 20? Use a TRUE/FALSE (TF) statement:
rootnode_row_TF = clado_events_tables[[23]]$node == 20
# print that row to screen:
> clado_events_tables[[23]][rootnode_row_TF,]
node node.type parent_br edge.length ancestor daughter_nds time_bp fossils label stratum
20 20 root NA NA NA 21, 36 5.2 NA inNode20 5
time_top time_bot reltimept piecenum piececlass SUBnode SUBnode.type SUBparent_br
20 5.1 10 4.9 1 subtree 3 root NA
SUBedge.length SUBancestor SUBdaughter_nds SUBtime_bp SUBfossils SUBlabel clado_event_type
20 NA NA 1, 2 0.1 NA inNode3 sympatry (y)
clado_event_txt clado_dispersal_from clado_dispersal_to sampled_states_AT_nodes
20 K->K,K 2
sampled_states_AT_brbots left_desc_nodes right_desc_nodes samp_LEFT_dcorner
20 NA 1 2 2
samp_RIGHT_dcorner anagenetic_events_txt_below_node
20 2 <NA>
# print the actual event (several equivalent methods):
> clado_events_tables[[23]]$clado_event_txt[rootnode_row_TF]
[1] "K->K,K"
> clado_events_tables[[23]]$clado_event_txt[75]
[1] "K->K,K"
# (since I happen to know in this case, that the root node is on row 75, and the column "clado_event_txt" is column 26
> clado_events_tables[[23]][75,26]
[1] "K->K,K"
The meanings of the column headings of Biogeographic Stochastic Mapping events tables
I am out of time to do a full tutorial, but here is a quick description of what each column in these events tables is supposed to mean.
NOTE: REQUIRED BACKGROUND is an understanding of node numbers in APE's phylo objects. See here:
Node numbering in APE/BioGeoBEARS
http://phylo.wikidot.com/example-biogeobears-scripts#node_numbering
That link also discusses the BioGeoBEARS function prt() (print tree to table), which is also necessary for understanding the BSM event tables output.
The column headings that come from prt() and chainsaw2()
#######################################################
# Reading BSM output
#######################################################
# The key outputs are:
#
# 1. The anagenetic events table:
head(ana_events_tables[[1]])
# 2. The cladogenetic events table:
tail(clado_events_tables[[1]][,-20])
# The first 9 columns of these tables are information about the
# nodes and branch of the tree. These columns are output
# by the BioGeoBEARS function prt() ("print tree to table").
# See here for a mini-tutorial about node numbers and branch
# numbers in APE/R, and about prt():
# http://phylo.wikidot.com/example-biogeobears-scripts#node_numbering
# MEANINGS OF PRT() columns:
# (Note: Like biologists/paleontologists, we are assuming tips are the top
# of the tree, and the root is at the bottom. Computer scientists often
# do it backwards.)
#
# node - The node number used in the standard APE phylo object
# node.type - Is the node "tip", "internal", or "root"?
# parent_br - What is the branch number for the branch below the node?
# (parent_br gives you the row of tr$edge and tr$edge.length)
# edge.length - The branch length
# ancestor - The node number at the bottom of the branch below the node
# daughter_nds - The node numbers of the daughters above the node
# time_bp - The time-before-present of the node (where the highest tip is age 0)
# fossils - If a tip, is it a fossil tip? (as determined by the
# fossils_older_than option of prt()
# label - The node name. Equals the tip species if a tip node, or "inNode"+
# node number, if internal.
# tipnames - (optional output) If get_tipnames=TRUE in prt(), this column
# contains the alphabetical, comma-delimited list of all tip species
# that descend from that node. This is useful as a unique
# identifier of a clade or bipartition across trees, which can be a
# huge problem in APE/R (which can relabel node numbers after any
# change, rotation, etc.)
# COLUMNS DEALING WITH TIME-STRATIFICATION
# Background: A time-stratified tree, in R, is a nightmare. Imagine taking a
# phylogeny, then taking a chainsaw to it, cutting across it
# at several different timepoints. Now you have a pile of "tree pieces";
# some of these will be subtrees (almost always with branches below the
# subtree root node), and some will just be segments (or "1-branch trees",
# if you like).
#
# Keeping track of all of this in R pretty much sucks, as the code
# has to pass all of the likelihoods up and down, keep track of which
# branches connect to what, have special code for single-branch segments,
# more special code for fossil branches that don't come up to 0, etc.
# This would all be much easier with an object-oriented approach, but,
# well, I'm stuck with this unless/until I reprogram everything from
# scratch in a new language. (Yes, I should have listened to my advisor
# and done everything in C++, but then we wouldn't have all the nice
# graphics, and I might still be fighting with compilers on some Windows32
# machine instead of doing science.)
#
# Anyhow, the tree-pieces are generated by the BioGeoBEARS function chainsaw2().
# This generates a list of tree-pieces. To keep track of all of this I have a
# big table, which includes the information from prt(), and then has
# similar information for each tree-piece.
# These columns track which stratum a tree piece is in, the beginning and ending of
# that stratum, the relative time point of the node within that stratum, and
# number and type of the tree-piece in the tree-pieces list produced by chainsaw2():
#
# stratum
# time_top
# time_bot
# reltimept
# piecenum
# piececlass
#
#
# The "SUB" columns contain the prt() output, but for the subtree pieces:
# SUBnode
# SUBnode.type
# SUBparent_br
# SUBedge.length
# SUBancestor
# SUBdaughter_nds
# SUBtime_bp
# SUBfossils
# SUBlabel
#
The meanings of the column headings of clado_events_tables
# COLUMNS RECORDING CLADOGENETIC RANGE-INHERITANCE EVENTS
# clado_event_type - What type of event at cladogenesis? (and letter
# for the parameter that BioGeoBEARS uses to control
# the process; see Figure 1 of Matzke (2013, Frontiers
# in Biogeography).
# sympatry (y)
# vicariance (v)
# subset sympatry (s)
# founder-event/jump dispersal (j)
# clado_event_txt - What specific event happened? E.g.:
# K->K,K would be a narrow sympatry event
# ABC->ABC,ABC would be a broad sympatry event
# ABC->A,BC would a vicariance event, etc.
# (NOTE: This is one reason SINGLE-LETTER CODES
# are REQUIRED by BSM)
# clado_dispersal_to - *IF* it's a dispersal event, dispersal to where?
# sampled_states_AT_nodes - What's the state number of the range at the node?
# (1-based index: typically,
# 1=null range
# 2=A
# 3=B
# 4=AB ...for a 2-area system)
# sampled_states_AT_brbots - State number at the bottom of the branch below the node
# left_desc_nodes - State number at top of the left descendant branch
# right_desc_nodes - State number at top of the right descendant branch
# samp_LEFT_dcorner - State number just after speciation of the node
# (bottom of the left descendant branch)
# samp_RIGHT_dcorner - State number just after speciation of the node
# (bottom of the right descendant branch)
# anagenetic_events_txt_below_node - For the branch below the node, records the
# events along the branch, in a plain-text, semicolon-
# delimited format. Use the function
# events_txt_list_into_events_table() to convert
# these to a table of events (as in the object
# ana_events_tables)
#
The meanings of the column headings of ana_events_tables
# COLUMNS RECORDING ANAGENETIC RANGE-INHERITANCE EVENTS
#
# NOTE CAREFULLY: An ana_events_table include cladogenetic information, because it is
# copied from the clado_events_table when clado_events_table$anagenetic_events_txt_below_node
# is converted into an ana_events_table. This helps us to keep track of the states at
# the top and bottom of each branch. However, this information will be COPIED to
# every anagenetic event. So, if there are several events on branch, the same cladogenetic
# information will be copied to every row of the anagenetic events table. SHORT VERSION:
# DON'T COUNT CLADOGENETIC EVENTS FROM THE ANAGENETIC EVENTS TABLE; THAT IS WHAT THE
# CLADOGENETIC EVENTS TABLE IS FOR!
#
# COLUMNS SPECIFIC TO AN ana_events_table (the others are as above)
#
# trynum - The number of tries that occurred before a successful
# simulation of this branch history.
# (If trynum is greater than the user-set maxtries, then
# the history was created with the so-called "manual"
# option that finds the minimum events and puts them
# in random order; see the BioGeoBEARS Google Group.)
# brlen - The branch length (or length of the segment, when time-
# stratified).
# current_rangenum_1based - The 1-based index of the range (the state) before the event.
# new_rangenum_1based - The 1-based index of the range (the state) after the event.
# current_rangetxt - The text version of the range before the event.
# new_rangetxt - The text version of the range after the event.
# abs_event_time - The absolute time of the event (time before the highest
# tip in the overall tree, which marks time=0 my before
# present.
# event_time - Relative event time (within the time stratum).
# event_type - Type of event (currently allowed anagenetic events are
# d, e, and sometimes a).
# event_txt - Event text, eg. A->AB or ABC->BC.
# new_area_num_1based - The 1-based index of the new area (note: area indexes are
# different than range indexes!) for a d or a event.
# lost_area_num_1based - The 1-based index for a lost area for an e event.
# ana_dispersal_from - The source area (important when there are multiple
# possible areas, e.g. ABC -> ABCF. The source area is
# probabilistically chosen according to the dispersal
# constraints etc. used in the model. The function doing this
# is simulate_source_areas_ana_clado(), run near the end of
# the example script.
# dispersal_to - The area being dispersed to (text version).
# extirpation_from - The area being lost (text version).
#
How to convert multi-page PDFs to animations
In terms of producing nice animated GIFs for websites and presentations, I have had the best luck (on my Mac OS X) with ImageMagick: http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=25640 .
On Macs at least, ImageMagick's "convert" function will require that you have ghostscript (gs) installed (http://ghostscript.com/download/gsdnld.html). If you use MacPorts to install ImageMagick, ghostscript should come along with that.
For example, in Terminal, navigate to the directory with your PDF, then:
convert -delay 20 -density 100x100 -dispose previous DEC+J_M3b_strat_50BSMs_v1.pdf -layers coalesce DEC+J_M3b_strat_50BSMs_v1_100x100.gif
I have found that 200x200 is already a pretty big file. Going very high (e.g. 1000x1000) seems to freeze my computer etc.