BioGeoBears

Introduction

(link to this section)

BioGeoBEARS is an R package, authored by Nicholas J. Matzke, that is designed to perform inference of biogeographic history on phylogenies, and also model testing and model choice of the many different possible models of how biogeography may evolve on a phylogeny (dispersal, vicariance, founder-event speciation, DEC, DIVA, BAYAREA, etc.)

"BioGeoBEARS" is short for "BioGeography with Bayesian (and likelihood) Evolutionary Analysis in R Scripts". It implements the LAGRANGE (Ree & Smith 2008) DEC model (2 free parameters) as well as models with fewer or more free parameters. Standard model-testing procedures may then be applied.

The default implementation is actually ML (maximum likelihood), for comparison with LAGRANGE, but there is a Bayesian option making use of LaplacesDemon — contact me at gro.soibmin|ekztam#gro.soibmin|ekztam, if you can't figure it out from the functions.

See Matzke (2013), Frontiers of Biogeography, for an introduction. See also my IBS poster.

Author: Nicholas J. Matzke

See also

(link to this section)

Most important: Example script and most-recent updates/fixes to the R code.
Other example BioGeoBEARS scripts.
See also: Per-event weights of cladogenetic range inheritance events in BioGeoBEARS -- the parameters y, s, v, and j
See also: Biogeographical Stochastic Mapping, w. example code.
See also: Advice on statistical model comparison in BioGeoBEARS.
See also: BioGeoBEARS mistakes to avoid.
See also: BioGeoBEARS warnings to ignore, mostly
See also: BioGeoBEARS errors to NOT ignore
See also: BioGeoBEARS stuff that mostly doesn't work, yet
See also: Publications using BioGeoBEARS.
See also: BioGeoBEARS workshops

Installation

(link to this section)

To install and run BioGeoBEARS, and the required supporting packages rexpokit and cladoRcpp, use install.packages as follows:

#######################################################
# Installing BioGeoBEARS
#######################################################
# Uncomment this command to get everything
# I request that you use the "0-cloud" R repository at "http://cran.rstudio.com" as it is 
# the only one that keeps download statistics. However, this is up to you.
#######################################################
#
# I recommend you install optimx first, with the dependencies=TRUE command, as
# optimx has many dependencies that people sometimes forget to install.

# Install optimx
install.packages("optimx", dependencies=TRUE, repos="http://cran.rstudio.com")

# Install BioGeoBEARS from CRAN 0-cloud:
install.packages("BioGeoBEARS", dependencies=TRUE, repos="http://cran.rstudio.com")

#######################################################
#
# IMPORTANT NOTE: ONCE BIOGEOBEARS IS INSTALLED, 
# USE THE SOURCE() COMMANDS IN THE EXAMPLE SCRIPT AT
#
# http://phylo.wikidot.com/biogeobears#script
#
# TO GET BUG FIXES AND UPDATES.
#
#
#######################################################

The packages can also be downloaded from the CRAN links below. R can be freely downloaded from: R-Project.org

rexpokit: http://cran.r-project.org/web/packages/rexpokit/index.html

cladoRcpp: http://cran.r-project.org/web/packages/cladoRcpp/index.html

BioGeoBEARS: http://cran.r-project.org/web/packages/BioGeoBEARS/index.html

The PhyloWiki page you are reading (http://phylo.wikidot.com/biogeobears) is currently the one I keep most up-to-date. To get the most up-to-date fixes and additions to the code, be sure to run the source() commands in the example script at the bottom of this page. Alternatively, you can save the .R files from those URLs and source() them locally.

At some point (after a major new CRAN update) I will move updates to github (when that is the case, see https://github.com/nmatzke/BioGeoBEARS; it is easier to use library(devtools) and install it directly within R (see the install_github command).

How to get help

(link to this section)

After reading this page and trying the example script, researchers sometimes still have difficulties getting their data to run. This is usually due to either unfamiliarity with R or issues with formatting the input files.

Many of these questions are easily answered, and I have created a listserv for BioGeoBEARS users to ask questions of me and each other. While for the moment I am happy to answer questions on email as well, it is more efficient for everybody if this happens on the listserv, where answers are archived for future users. So don't be shy! Any question or difficulty you have encountered has likely been encountered by someone else also. I will sometimes repost email answers to the google group for the benefit of other users.

Joining the list requires my approval (this avoids spambots). You should be able to request to join the group via google groups, or if not, email me at gro.soibmin|ekztam#gro.soibmin|ekztam to be added. Once you are added, you can post by emailing: biogeobearsATgooglegroups.com .

The listserv is publicly-readable at: https://groups.google.com/forum/#!forum/biogeobears

The introductory welcome message is: https://groups.google.com/forum/#!topic/biogeobears/H6PpmRJNxeg

In short, I recommend:

  1. Work through the example script at the bottom of this page, using the default example files, and reading the comments
  2. Then, copy and modify the script to use your own working directory and data files.
  3. If you have a problem, search the google group.
  4. If you don't find an answer, ask the google group or email me. First, double-check the most common problems: (1) Make sure your R version and packages are up-to-date; (2) use the most recent source() commands in the example script, and (2)Make sure you've included all key commands in the example script. When you post about errors/crashes, it is helpful to give the a description of the problem and the data/setup, the error messages, your operating system, and the results of typing sessionInfo() into R (this prints out the package versions etc. that you have installed).
  5. If none of this works, email me your input files. I have gotten quite good at finding the (sometimes subtle) issues with the input files that can cause problems.

Dispersal probability a function of distance (the "+x" model) (update July 2014)

(link to this section)

Many have noticed that the BioGeoBEARS supermodel contains an "x" parameter that can be used to estimate dispersal probability as a function of distance. The model works, computationally, but such models and their properties have not yet been thoroughly explored, so here are some notes/cautions to guide researchers who wish to try out the model.

  • If a distance matrix is specified, the rates/weights of dispersal processes (e.g., d, j) between two areas (e.g., A, B) will be multiplied by (distance{A to B})^x.
  • By default, x is fixed to 0, which is equivalent to distance having no effect (any distance^0 = 1).
  • To use the distances model, set parameter "x" to be "free". If "x" ends up being e.g. -1, then this amounts to an inverse distance model for dispersal probability. If it is e.g. -2, this would be sometimes-suggested inverse-square law.
  • While the distances model is intuitively very appealing, getting evidence for it might not be so easy. In likelihood-based discrete historical biogeography analyses, we are limited to a relatively small number of areas, and thus to a relatively small number of distances. If you only have e.g. 5 different distances in your geographic setup, you will probably have limited ability estimate the distance-dispersal function. (Note that Landis, Matzke, Moore & Huelsenbeck (2013) did find support for a distance effect in southeast asian Rhododendrons, in a 20-area analysis, using the BayArea program.)
  • So, as always, model-choice procedures (LRT, AIC, AICc, etc.) should be used to determine if your data supports the more complex "+x" model. (See advice on statistical model choice in biogeography, in the last section of this script.)
  • Note that if the distance between A and B is set to 0, and x is e.g. 01, then the modifier is 0.0^-1 = Inf. So, if some areas are touching and others are not, you have to make some decision about what the correct "distance" should be between the touching areas. Your guess is as good as mine. An appealing model would be that there is a dispersal rate between touching areas, and then a modifier that is applied as distance increases above 0. You would get this if you set the distance between touching areas to "1 unit" (since 1 to any power still equals 1).
  • Note the units you use for distance may be important. If distance is in km, and x=-1, then a 100 km distance adjusts dispersal probability by 1/100. If distance was in meters, 100,000 meters would cause an adjustment of 1/100,000. Now, because d, e, etc. are free parameters, in theory they should increase to account for this big decline in dispersal probability, and you should end up with the same likelihoods regardless of the units of distance.
  • However, because we have default limits on d and e, it's possible that some combinations of distance and x would require ML values of d & e that are outside of the limits. And, the fact that d and x probably interact may make the ML search more difficult (although in my experience the +x models do seem to converge — usually.
  • Short version: you should be especially careful about these issues with +x models. Options to reduce the risk of optimization failure include: (1) rescaling your distances, e.g. divide all distances by the largest or smallest distance, so that the largest or smallest distance is 1. (2) widening the limits on d, e, etc. (3) repeating ML searches with different starting points. (4) repeating ML searches with different units on distance.
  • Note that "j" event probabilities are adjusted the same as "d" with the distances model, even though "j" is a weight rather than a rate. j has intrinsic limits (depending on the weights regime set up for other cladogenetic range-changing processes), and changes to j automatically effect the probability of v, s, and y events. So, j may have a more limited ability to adjust in +J+x models than d and e, and rescaling distances may be even more important.
  • You should probably always set "speedup" equal to FALSE with +x models, so that you do the full default optimx search.
  • The format for the distance matrix file is the same as for manual dispersal multipliers. given here: https://groups.google.com/d/msg/biogeobears/XXaJqmI232o/0OOsXBaqNeIJ
  • As you can see from the above, the +x model is new and not yet well-tested, so it should be viewed as experimental. Researchers should THINK about what the +x model is doing and what the results are saying, and do a variety of analyses to test their intuitions. I do believe the model is valid, and people should publish with it if they like, it just should not yet be viewed as a standard model that can be run blindly on defaults.
  • The distances model is being proposed in Matt van Damm & Nicholas J. Matzke, in preparation, email us if interested.

Fossils in BioGeoBEARS (update June 2014)

(link to this section)

As BioGeoBEARS has developed, I have programmed a number of methods for including fossil data. The initial steps are described in chapter 4 of my Ph.D., and additional methods have been developed to help collaborators deal with the specific fossil data available to them. It will be some time before all of these methods are formally described and published, as first I have to get all of my other Ph.D. chapters accepted. However, the methods exist, so they should be used if needed. (Whether or not they are easy to figure out and use is another question — in some cases it will be awhile before I can work up proper tutorials/examples, and sometimes the methods may need further refinement to deal with particular datasets; I am open to collaborations if a researcher finds that useful.)

In the meantime, I will briefly describe the options for fossil data that currently exist in the BioGeoBEARS package / the additional source files which will soon be added to the package. See: Fossil data in biogeographical analysis in BioGeoBEARS

Stochastic mapping under biogeographical models (update May 2014)

(link to this section)

I have created functions for stochastic mapping under the various BioGeoBEARS models, and for plotting the resulting simulated histories. Stochastic mapping simulations are simulations conditional on the observed tree and a given model+model parameters. If you average many stochastic mapping simulations, you should get the same probabilities of each state that are produced in ancestral states estimation. However, the individual stochastic maps constitute possible histories, and it can be useful to count stochastically mapped events, their dates, etc., as it is more straightforward to compare counts and dates of events between analyses, than to compare the ancestral state probabilities.

Here are small preview graphics:

DEC vs. DEC+J ML models on unconstrained ("M0") Psychotria analysis:

stochastic_maps_DEC_vs_DECj_M0_v1_WORKED_ARCHIVE_50dpi.gif

DEC vs. DEC+J ML models on a time-stratified ("M3") Psychotria analysis. Here, lineages cannot occupy islands before they rise above the ocean.

stochastic_maps_DEC_vs_DECj_M3_v1_WORKED_ARCHIVE_yes_dotted_lines_50dpi.gif

Caption: Stochastic mapping of approximately equiprobable alternative histories under each model. Left: DEC model. Right: DEC+J, which includes founder-event speciation. Key: Blue, K=Kauai. Green: O=Oahu. Yellow: M=Maui-Nui. Red: H=Hawaii Big Island. Kauai is the oldest high island (~5.2 Ma), the Big Island is the youngest island (~0.5 Ma).

Here are larger versions of the graphics:

http://phylo.wdfiles.com/local--files/biogeobears/stochastic_maps_DEC_vs_DECj_M0_v1_WORKED_ARCHIVE_100dpi.gif

http://phylo.wdfiles.com/local--files/biogeobears/stochastic_maps_DEC_vs_DECj_M3_v1_WORKED_ARCHIVE_yes_dotted_lines_100dpi.gif

I have not quite uploaded the code for stochastic mapping yet, it is still somewhat experimental and needs both some testing on real problems (email me if interested), further thought about graphics and output formats, and a writeup/tutorial. But be sure to check the source() commands list in this wiki page, that is what I update most often.

Note 2014-05-28: Fixed issue with stochastic mapping not happening on the root branch below subtrees within a stratum. So, adding areas through time works. Removing areas through time may be a slightly different case and should be double-checked at some point.

Note 2014-07-21: I have submitted the basic Biogeographic Stochastic Mapping algorithm and code as an Applications Note in the journal Bioinformatics. The code is here: http://phylo.wikidot.com/local--files/biogeobears/code_data_files.zip . Unzip and run with the .R file beginning "run_".

SSE Simulation results and code (update September 2014)

(link to this section)

DEC, DEC+J, and most methods in use in historical biogeography implicitly assume that the observed tree is the complete tree, and that lineage-level speciation and extinction are independent of the range evolution process*. Models that correct for these factors are SSE (state-dependent speciation-extinction models), such as BiSSE, GeoSSE, and ClaSSE, but the SSE methods have not yet been widely applied in biogeography, probably because of the large number of parameters they involve (which will increase dramatically as the number of areas and the number of ranges increase), and the typical small size of biogeographic datasets.

Nevertheless, it is interesting to ask whether or not SSE-type processes, as well as missing speciation & extinction events in the observed tree, might cause a bias in inference under DEC or DEC+J, or a problem in distinguishing between these models.

It is also interesting to ask under what values of d, e, and j parameter inference, model choice, and ancestral range inference are reliable.

Therefore, in Matzke (2014), I performed forward simulations under the following macroevolutionary models:

1. Yule, birthrate=0.3
2. BD, birthrate=0.3, deathrate=0.1
3. BD, birthrate=0.3, deathrate=0.3 (this one dies out a lot)
4. SSE, birthrate=0.3, deathrate=0, prob(birth)~rangesize^1
5. SSE, birthrate=0.3, deathrate=0.1, prob(birth)~rangesize^1, prob(death)~rangesize^-1
6. SSE, birthrate=0.3, deathrate=0.3, prob(birth)~rangesize^1, prob(death)~rangesize^-1

The latter three models make speciation more probable in lineages with large geographic ranges, and make extinction less probable in lineages with large geographic ranges.

For each of these macroevolutionary models, geographic range evolved simultaneously with the lineage birth-death process, under a variety of values of d, e and j:

d = 0, 0.03, 0.15
e = 0, 0.03, 0.15
j = 0, 0.02, 0.1, 0.3 (these are per-event weights, not rates)

Parameter combinations were excluded when: deathrate > birthrate, and when e > d. This left 138 parameter combinations. For each of these combinations, the phylogeny+geographical history was simulated 100 times, with each simulation running until the clade had 50 living species.

The complete trees (including extinct lineages) and complete simulated geographic histories were saved, and then the trees were pruned to be "observed" trees. Inference was then conducted under DEC and DEC+J.

These results are discussed in Matzke (2014), Systematic Biology, but the short version is:

1. DEC and DEC+J are highly distinguishable at moderate values** of d and e (0, 0.03) which are often observed in empirical inference. At very high values (d, e=0.15, i.e. on average a d or e event on every branch), this breaks down somewhat, especially for low j, although high j can often still be detected. However, even DEC inference breaks down at high d and e, as would any phylogenetic method where the rates of change are high per branch.

2. Inference of e is always poor, under both DEC+J and DEC (as shown in the original DEC papers)

3. Inference of d and inference of j do not interfere with each other under reasonable parameter values. They are distinguishable. If anything, DEC infers a d too high when the generating model is DEC+J.

4. The effects of BD or SSE on inference appear moderate in most situations.

PDF Files of simulation results

(link to this section)

A. DEC_DECJ_inf_boxplots_ALL.pdf — Simulations when starting range is a single area

B. startABCD_DEC_DECJ_inf_boxplots_ALL.pdf — Simulations when the starting range is all areas

C. simtree_stats_boxplots_ALL2.pdf — Tree statistics, range sizes, number of events, etc., under the simulations. These are in the same order as in A, and correspond to the same parameter combinations.

R functions for SSE simulations

(link to this section)

I thought I had put the SSE simulation code in the Matzke (2014) Dryad archive, but apparently this got dropped between the various rounds of revisions. The code is here as a zipfile: http://phylo.wikidot.com/local--files/biogeobears/Matzke_2014_simcode.zip . Eventually it will be included with documentation in the next major CRAN BioGeoBEARS update. Until then, you can (a) try and figure the functions out (it shouldn't be too hard), or (b) try to get me to help you (more likely if we are collaborating).

Notes

Note * — DEC & DEC+J models do have the process where, if e is positive, range can evolve to 0 size through a range-contraction event. This is usually not inferred, because usually all species in an analysis are living and occupy a nonzero range. Whether or not a continuous-time range-contraction process really causes extinction in real life is debatable. In my simulations, extinction occurs according to the background deathrate (mu), and e can cause additional extinction through the range-contraction process.

So, a "high" rate of d or e is e.g. 50% of the speciation rate. A "low" rate might be 1/10 the speciation rate.

Note ** — The values of d and e only have meaning in relation to the speciation rate. For Hawaiian Psychotria this is ~0.33 (geiger package, birthdeath function). To compare the simulation results to your own data, you would want to estimate your speciation rate, and scale everything mentally. E.g., if your speciation rate is 0.6, your equivalent values of d and e would be doubled from the simulations.

Abbreviations in plots:

birthrate = lambda
deathrate = mu
exponent on rangesize multiplier for speciation = alpha
exponent on rangesize multiplier for extinction = omega

Update to simulation results: Animations

(link to this section)

I have developed plotting routines (not in the package yet) for the above simulations. These plot the biogeographic ranges on top of the tree. Here, line thickness = range size (from 1 to 4 areas). The colors as the same for the stochastic maps, and in this version of the simulations, the simulation always starts in area "A", or, if you like, "K" for "Kauai", colored blue. There is no constraint or stratigraphy in these simulations, however. The color light grey = all 4 areas occupied.

simhistory_allsims_observed_nostates_ALL_ARCHIVE_WORKED_50dpi.gif

Here is a PDF, and a zoomed-in GIF, of the simulations. Because some of the BD trees are very large, all plots have an x-axis of 100 million years for comparison. This does indicate real variability in the simulations, and thus robustness in the conclusions about accurately inferring DEC versus DEC+J models.

http://phylo.wikidot.com/local--files/biogeobears/simhistory_allsims_observed_nostates_ALL_ARCHIVE_WORKED.pdf

http://phylo.wikidot.com/local--files/biogeobears/simhistory_allsims_observed_nostates_ALL_ARCHIVE_WORKED_100dpi.gif

Fixing states at multiple nodes (updates February 2014)

(link to this section)

I have added the capability to fix the state likelihoods at more than one node, for both regular and stratified analyses. Instead of just listing 1 node in fixnode, you can do multiple nodes, e.g. BioGeoBEARS_run_object$fixnode = c(20,21), and BioGeoBEARS_run_object$fixlikes can now take a matrix with a number of rows equal to the length of fixnode. Various other minor improvements in fixnodes analysis were also added. (Update March 2014: I have now put the fixnodes into the uppass calculations as well, although this only matters in rare situations.) To enable the improvements, source the latest code via:

library(BioGeoBEARS)
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_basics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_generics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_classes_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_models_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_plots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_readwrite_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_simulate_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stratified_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_univ_model_v1.R")
source("http://phylo.wikidot.com/local--files/biogeobears/calc_loglike_sp_v01.R")
calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations
calc_independent_likelihoods_on_each_branch = compiler::cmpfun(calc_independent_likelihoods_on_each_branch_prebyte)
    # slight speedup hopefully

Fixes for optimx2012/optimx2013/optim issues; bug fix in uppass calculations (update January 2014)

(link to this section)

During the January 7 workshop at the International Biogeography Society Meeting in Canberra, Australia, some Windows users with new versions of R were not able to install the 2012 version of optimx. Switching to the R base function optim() worked, but caused a problem with producing the graphics.

I also noticed a bug in the uppass calculations, which are important for calculating ancestral state probabilities. The downpass calculations dominate the ancestral state probabilities for most nodes, however, especially at the branches attaching to the root, the bug fix may make a difference. This may resolve certain cases where "contradictory" node states and corner states were observed (other cases may be probability ties — check the pie charts — or due to the fact that ancestral states under the global ML model are not the same thing as a single coherent history; see joint ancestral states, Felsenstein 2004, for that sort of estimation method).

I have now fixed all of these problems in the revised example script, posted below. (Update March 2014: I have also fixed the uppass calculations in stratified analyses.) The script includes two source() commands which contained revised functions. Therefore, for the moment, begin all BioGeoBEARS analyses with:

library(optimx)  # (either 2012 or 2013 version, as of January 2014)
library(FD)        # for FD::maxent() (make sure this is up-to-date)
library(snow)     # (if you want to use multicore functionality; prob. better than library(parallel))
library(BioGeoBEARS)
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_basics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_generics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_classes_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_models_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_plots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_readwrite_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_simulate_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stratified_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_univ_model_v1.R")
source("http://phylo.wikidot.com/local--files/biogeobears/calc_loglike_sp_v01.R")
calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations
calc_independent_likelihoods_on_each_branch = compiler::cmpfun(calc_independent_likelihoods_on_each_branch_prebyte)
    # slight speedup hopefully

As optimx is currently a dependency of BioGeoBEARS, you will still need some version of optimx to get BioGeoBEARS to load. I will try to make it optional in the next big update.

I have tested the new example script using 2012 optimx, 2013 optimx, and optim(), and all of them work. If you have problems, email me or the BioGeoBEARS google group!

Citing BioGeoBEARS, DEC+J, and other models (update September 27, 2014)

(link to this section)

Citation for the DEC+J model/founder-event speciation, and statistical model selection in historical biogeography:

  1. Matzke, N.J. (2014). "Model Selection in Historical Biogeography Reveals that Founder-event Speciation is a Crucial Process in Island Clades." Systematic Biology. Available online August 14, 2014. http://dx.doi.org/10.1093/sysbio/syu056 http://sysbio.oxfordjournals.org/content/early/2014/09/21/sysbio.syu056.full — Supplementary data on Dryad: http://datadryad.org/resource/doi:10.5061/dryad.2mc1t

On BioGeoBEARS:

  1. Matzke, Nicholas J. (2013). "Probabilistic Historical Biogeography: New Models for Founder-Event Speciation, Imperfect Detection, and Fossils Allow Improved Accuracy and Model-Testing." Frontiers of Biogeography. 5(4), 242-248. (Scholar | Journal)
  2. Also cite R packages that you use (see below).

Other models and pieces of BioGeoBEARS are found in my Ph.D. and various articles in various stages of submission. These are listed below.

I have posted the approved version of my Ph.D. in "Files", below (scroll to the bottom of the page and click "Files"). Here is the reference, for citation purposes. (Note: I've also added the ProQuest version, which is the permanent repository, and open-access. However, email me for supplemental material etc.; I believe I uploaded this to ProQuest, but I don't see how to access it.)

  • Matzke, Nicholas Joseph (2013). Probabilistic Historical Biogeography: New models for founder-event speciation, imperfect detection, and fossils allow improved accuracy and model-testing. Ph.D. thesis, Department Integrative Biology and Designated Emphasis in Computational and Genomic Biology, University of California, Berkeley. Pages 1-240. August 2013. Available at: http://phylo.wikidot.com/local--files/biogeobears/Matzke_PhD_FINAL_v5_w_refs.pdf
  • Matzke, Nicholas J. (2013). Probabilistic Historical Biogeography: New models for founder-event speciation, imperfect detection, and fossils allow improved accuracy and model-testing. Department of Integrative Biology, University of California, Berkeley. August 2013. Order No. 3616487, ProQuest Dissertations and Theses, pp. 1-240. Open-access at: http://search.proquest.com/docview/1526024556

Which items you decide to cite depends on which you used, and what your journal's rules are for citing dissertations, articles in review, etc. Ideally, I would eventually like to see people cite at least the Methods in Ecology and Evolution article for the R package, the Systematic Biology article for the DEC+J model, and the Journal of Biogeography article for the likelihood versions of DIVA, DIVA+J, BAYAREA, and BAYAREA+J, but it is hard to predict when each article will be accepted. In the meantime, users can cited whatever has been published, the thesis itself, or articles in in review/in prep (I am happy to send these if you don't have them).

Apart from journals, it has become common practice to cite R packages that are important in an analysis. Every R package has a CITATION file. Those citations can be found with e.g. "citation("BioGeoBEARS")", and are given below.

The articles and their statuses are:

  1. Matzke, Nicholas J. (2014). "Stochastic mapping in historical biogeography." Bioinformatics. In revision. (draft PDF)
  2. Matzke, Nicholas J. (2014). "BioGeoBEARS: an R package for model testing and ancestral state estimation in historical biogeography." Methods in Ecology and Evolution, revising & improving package documentation for resubmission.
  3. Matzke, Nicholas J. (2014). "Model selection reveals differences in cladogenesis processes operating in island versus continental clades." Journal of Biogeography, in revision.

For the distance-dependent dispersal model:

  1. Van Dam, Matthew; Matzke, Nicholas J. (2014). Evaluating the influence of connectivity and distance on biogeographic patterns in the Southwestern deserts of North America. Journal of Biogeography, in review.

Each article has various supplemental materials etc. I am happy to share them, email me at gro.soibmin|ekztam#gro.soibmin|ekztam if you would like to see those. Thanks! Nick

Citation of R packages

(link to this section)

Apart from journals, every R package has a CITATION file. Those citations can be found with "citation("BioGeoBEARS")", and are given below. To get the current version and date, type e.g. "library(help = BioGeoBEARS)".

Matzke, Nicholas J. (2013). BioGeoBEARS: BioGeography with Bayesian (and Likelihood) Evolutionary Analysis in R Scripts. R package, version 0.2.1, published July 27, 2013 at: http://CRAN.R-project.org/package=BioGeoBEARS

Matzke, Nicholas J., and Sidje, Roger B. (2013). rexpokit: R wrappers for EXPOKIT. R package, version 0.24.2, published July 15, 2013 at: http://CRAN.R-project.org/package=rexpokit

Matzke, Nicholas J. (2013). cladoRcpp: C++ implementations of phylogenetic calculations. R package, version 0.14.2, published July 15, 2013 at: http://CRAN.R-project.org/package=cladoRcpp

Update November 12, 2013: BioGeoBEARS in one slide

(link to this section)

Here is an attempt to represent the BioGeoBEARS supermodel in one slide, comparing it to other historical biogeography methods in use.

NOTE: On smaller/laptop screens, some people aren't seeing the Parameters column. Zoom out if necessary (Apple-minus on a Mac, or View menu) to see the whole graphic. Or go to the graphic by itself.

BioGeoBEARS_supermodel.png

Figure: The processes assumed by different historical biogeography methods. Each of these processes is controlled by the specified parameter(s) in the BioGeoBEARS supermodel, allowing them to be turned on or off, or estimated from the data. Note that whether or not the data support a particular free parameter is an empirical question that should be tested with model choice procedures. Note also that this graphic deals only with the range-changing processes assumed by the different methods. BioGeoBEARS does not attempt to replicate e.g. the parsimony aspect of DIVA, just the processes allowed by DIVA (the BioGeoBEARS "DIVA" model could be called "DIVALIKE" to emphasize that it is a likelihood implementation of the processes assumed by DIVA). Similarly, BioGeoBEARS does not yet implement the "SSE" (state-based speciation and estimation rates) features of the GeoSSE model (Goldberg et al. 2011) of diversity. The ClaSSE model (Goldberg & Igić, 2012) can in theory use a parameter to represent the probability of each possible combination of ancestor range, left descendant range, and right descendant range. In that sense ClaSSE is the ultimate supermodel, although users would have to develop their own parameterizations to produce a reasonable biogeographic model, and the number of parameters inflates dramatically with number of areas — on defaults, 9 areas means 29-1=511 possible ranges, and this means 511x511x511 = 133,432,831 possible combinations of ancestor/left descendant/right descendant. The cladoRcpp R package, a dependency of BioGeoBEARS, is designed to efficiently calculate probabilities for these combinations, under the implemented biogeography models.

Time-stratification, dispersal multiplier matrices, distance matrices, areas allowed matrices, and area-of-areas matrices (update November 8, 2013)

(link to this section)

I introduce how to run analyses including time-stratification, dispersal multiplier matrices, distance matrices, areas allowed matrices, and area-of-areas matrices [in this post on the BioGeoBEARS Google Group].

SCRIPT: EXAMPLE THAT SHOULD RUN OUT OF THE BOX: Hawaiian Psychotria (revised and improved, 2014-07-14)

(link to this section)

#######################################################
# This is an introductory example script for the 
# R package "BioGeoBEARS" by Nick Matzke
# 
# All scripts are copyright Nicholas J. Matzke, 
# please cite if you use. License: GPL-3
# http://cran.r-project.org/web/licenses/GPL-3
# 
# I am happy to answer questions at matzke@nimbios.org, but
# I am more happy to answer questions on the 
# BioGeoBEARS google group
#
# The package is designed for ML and Bayesian inference
# of 
# 
# (a) ancestral geographic ranges, and 
# 
# (b) perhaps more importantly, models for the 
#     evolution of geographic range across a phylogeny.
#
# The example below implements and compares:
# 
# (1) The standard 2-parameter DEC model implemented in 
#     the program LAGRANGE (Ree & Smith 2008); users will
#     notice that the ML parameter inference and log-
#     likelihoods are identical
#
# (2) A DEC+J model implemented in BioGeoBEARS, wherein
#     a third parameter, j, is added, representing the 
#     relative per-event weight of founder-event / jump
#     speciation events at cladogenesis events.  The 
#     higher j is, the more probability these events have,
#     and the less probability the standard LAGRANGE
#     cladogenesis events have.
#
# (3) Some standard model-testing (LRT and AIC) is 
#     implemented at the end so that users may compare models
#
# (4) The script does similar tests of a DIVA-like model (Ronquist 1997)
#     and a BAYAREA-like model (Landis, Matzke, Moore, & Huelsenbeck, 2013)
# 
#######################################################

#######################################################
# Installing BioGeoBEARS
#######################################################
# Uncomment this command to get everything
# Please use the "0-cloud" R repository at "http://cran.rstudio.com" as it is 
# the only one that keeps download statistics
#######################################################
#
#
# # Install BioGeoBEARS from CRAN 0-cloud:
# install.packages("BioGeoBEARS", dependencies=TRUE, repos="http://cran.rstudio.com")
#
#######################################################

#######################################################
# SETUP -- libraries/BioGeoBEARS updates
#######################################################

# Load the package (after installation, see above).
library(optimx)         # You need to have some version of optimx available
                        # as it is a BioGeoBEARS dependency; however, if you
                        # don't want to use optimx, and use optim() (from R core) 
                        # you can set:
                        # BioGeoBEARS_run_object$use_optimx = FALSE
                        # ...everything should work either way -- NJM 2014-01-08
library(FD)       # for FD::maxent() (make sure this is up-to-date)
library(snow)     # (if you want to use multicore functionality; some systems/R versions prefer library(parallel), try either)
library(parallel)
library(BioGeoBEARS)

########################################################
# TO GET THE OPTIMX/OPTIM FIX, AND THE UPPASS FIX, 
# SOURCE THE REVISED FUNCTIONS WITH THESE COMMANDS
########################################################
library(BioGeoBEARS)
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_basics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_generics_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_classes_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_models_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_plots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_readwrite_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_simulate_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stratified_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_univ_model_v1.R")
source("http://phylo.wikidot.com/local--files/biogeobears/calc_loglike_sp_v01.R")
calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations
calc_independent_likelihoods_on_each_branch = compiler::cmpfun(calc_independent_likelihoods_on_each_branch_prebyte)
    # slight speedup hopefully

#######################################################
# SETUP: YOUR WORKING DIRECTORY
#######################################################
# You will need to set your working directory to match your local system

# Note these very handy functions!
# Command "setwd(x)" sets your working directory
# Command "getwd()" gets your working directory and tells you what it is.
# Command "list.files()" lists the files in your working directory
# To get help on any command, use "?".  E.g., "?list.files"

# Set your working directory for output files
# default here is your home directory ("~")
# Change this as you like
wd = np("~")
setwd(wd)

# Double-check your working directory with getwd()
getwd()

#######################################################
# SETUP: Extension data directory
#######################################################
# When R packages contain extra files, they are stored in the "extdata" directory 
# inside the installed package.
#
# BioGeoBEARS contains various example files and scripts in its extdata directory.
# 
# Each computer operating system might install BioGeoBEARS in a different place, 
# depending on your OS and settings. 
# 
# However, you can find the extdata directory like this:
extdata_dir = np(system.file("extdata", package="BioGeoBEARS"))
extdata_dir
list.files(extdata_dir)

# "system.file" looks in the directory of a specified package (in this case BioGeoBEARS)
# The function "np" is just a shortcut for normalizePath(), which converts the 
# path to the format appropriate for your system (e.g., Mac/Linux use "/", but 
# Windows uses "\\", if memory serves).

# Even when using your own data files, you should KEEP these commands in your 
# script, since the plot_BioGeoBEARS_results function needs a script from the 
# extdata directory to calculate the positions of "corners" on the plot. This cannot
# be made into a straight up BioGeoBEARS function because it uses C routines 
# from the package APE which do not pass R CMD check for some reason.

#######################################################
# SETUP: YOUR TREE FILE AND GEOGRAPHY FILE
#######################################################
# Example files are given below. To run your own data,
# make the below lines point to your own files, e.g.
# trfn = "/mydata/frogs/frogBGB/tree.newick"
# geogfn = "/mydata/frogs/frogBGB/geog.data"

#######################################################
# Phylogeny file
# Notes: 
# 1. Must be binary/bifurcating: no polytomies
# 2. No negative branchlengths (e.g. BEAST MCC consensus trees sometimes have negative branchlengths)
# 3. Be careful of very short branches, as BioGeoBEARS will interpret ultrashort branches as direct ancestors
# 4. You can use non-ultrametric trees, but BioGeoBEARS will interpret any tips significantly below the 
#    top of the tree as fossils!  This is only a good idea if you actually do have fossils in your tree,
#    as in e.g. Wood, Matzke et al. (2013), Systematic Biology.
# 5. The default settings of BioGeoBEARS make sense for trees where the branchlengths are in units of 
#    millions of years, and the tree is 1-1000 units tall. If you have a tree with a total height of
#    e.g. 0.00001, you will need to adjust e.g. the max values of d and e, or (simpler) multiply all
#    your branchlengths to get them into reasonable units.
# 6. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
#######################################################
# This is the example Newick file for Hawaiian Psychotria
# (from Ree & Smith 2008)
# "trfn" = "tree file name"
trfn = np(paste(addslash(extdata_dir), "Psychotria_5.2.newick", sep=""))

# Look at the raw Newick file:
moref(trfn)

# Look at your phylogeny:
tr = read.tree(trfn)
tr
plot(tr)
title("Example Psychotria phylogeny from Ree & Smith (2008)")
axisPhylo() # plots timescale

#######################################################
# Geography file
# Notes:
# 1. This is a PHLYIP-formatted file. This means that in the 
#    first line, 
#    - the 1st number equals the number of rows (species)
#    - the 2nd number equals the number of columns (number of areas)
# 2. This is the same format used for C++ LAGRANGE geography files.
# 3. All names in the geography file must match names in the phylogeny file.
# 4. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
# 5. Operational taxonomic units (OTUs) should ideally be phylogenetic lineages, 
#    i.e. genetically isolated populations.  These may or may not be identical 
#    with species.  You would NOT want to just use specimens, as each specimen 
#    automatically can only live in 1 area, which will typically favor DEC+J 
#    models.  This is fine if the species/lineages really do live in single areas,
#    but you wouldn't want to assume this without thinking about it at least. 
#    In summary, you should collapse multiple specimens into species/lineages if 
#    data indicates they are the same genetic population.
######################################################

# This is the example geography file for Hawaiian Psychotria
# (from Ree & Smith 2008)
geogfn = np(paste(addslash(extdata_dir), "Psychotria_geog.data", sep=""))

# Look at the raw geography text file:
moref(geogfn)

# Look at your geographic range data:
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
tipranges

# Set the maximum number of areas any species may occupy; this cannot be larger 
# than the number of areas you set up, but it can be smaller.
max_range_size = 4

####################################################
####################################################
# KEY HINT: The number of states (= number of different possible geographic ranges)
# depends on (a) the number of areas and (b) max_range_size.
# If you have more than about 500-600 states, the calculations will get REALLY slow,
# since the program has to exponentiate a matrix of e.g. 600x600.  Often the computer
# will just sit there and crunch, and never get through the calculation of the first
# likelihood.
# 
# (this is also what is usually happening when LAGRANGE hangs: you have too many states!)
#
# To check the number of states for a given number of ranges, try:
numstates_from_numareas(numareas=4, maxareas=4, include_null_range=TRUE)
numstates_from_numareas(numareas=4, maxareas=4, include_null_range=FALSE)
numstates_from_numareas(numareas=4, maxareas=3, include_null_range=TRUE)
numstates_from_numareas(numareas=4, maxareas=2, include_null_range=TRUE)

# Large numbers of areas have problems:
numstates_from_numareas(numareas=10, maxareas=10, include_null_range=TRUE)

# ...unless you limit the max_range_size:
numstates_from_numareas(numareas=10, maxareas=2, include_null_range=TRUE)
####################################################
####################################################

#######################################################
# DEC AND DEC+J ANALYSIS
#######################################################

#######################################################
# Run DEC
#######################################################

BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# Set up a time-stratified analysis 
# (un-comment to use; see example files in extdata_dir, 
#  and BioGeoBEARS google group posts for further hints)
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.

# Input the maximum range size
BioGeoBEARS_run_object$max_range_size = max_range_size

# Multicore processing if desired
BioGeoBEARS_run_object$num_cores_to_use=1
# (use more cores to speed it up; this requires
# library(parallel) and/or library(snow). Parellel, 
# is default on Macs in R 3.0+, but apparently still 
# has to be typed on Windows machines. Note: apparently 
# parallel works on Mac command-line R, but not R.app.
# BioGeoBEARS checks for this and resets to 1
# core with R.app)

# Sparse matrix exponentiation is an option for huge numbers of ranges/states (600+)
# I have experimented with sparse matrix exponentiation in EXPOKIT/rexpokit,
# but the results are imprecise and so I haven't explored it further.
# In a Bayesian analysis, it might work OK, but the ML point estimates are
# not identical.
# Also, I have not implemented all functions to work with force_sparse=TRUE.
# Volunteers are welcome to work on it!!
BioGeoBEARS_run_object$force_sparse=FALSE

# Give BioGeoBEARS the location of the geography text file
BioGeoBEARS_run_object$geogfn = geogfn

# Give BioGeoBEARS the location of the phylogeny Newick file
BioGeoBEARS_run_object$trfn = trfn

# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)

# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE

# Set up DEC model
# (nothing to do; defaults)

# Look at the BioGeoBEARS_run_object; it's just a list of settings etc.
BioGeoBEARS_run_object

# This contains the model object
BioGeoBEARS_run_object$BioGeoBEARS_model_object

# This table contains the parameters of the model 
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

# Run this to check inputs. Read the error messages if you get them!
check_BioGeoBEARS_run(BioGeoBEARS_run_object)

# For a slow analysis, run once, then set runslow=FALSE to just 
# load the saved result.
runslow = TRUE
resfn = "Psychotria_DEC_M0_unconstrained_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    resDEC = res
    } else {
    # Loads to "res"
    load(resfn)
    resDEC = res
    }

#######################################################
# Run DEC+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1

BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)

# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE

# Set up DEC+J model
# Get the ML parameter values from the 2-parameter nested model
# (this will ensure that the 3-parameter model always does at least as good)
dstart = resDEC$outputs@params_table["d","est"]
estart = resDEC$outputs@params_table["e","est"]
jstart = 0.0001

# Input starting values for d, e
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","init"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","est"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","init"] = estart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","est"] = estart

# Add j as a free parameter
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = jstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = jstart

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

resfn = "Psychotria_DEC+J_M0_unconstrained_v1.Rdata"
runslow = TRUE
if (runslow)
    {
    #sourceall("/Dropbox/_njm/__packages/BioGeoBEARS_setup/")

    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)

    resDECj = res
    } else {
    # Loads to "res"
    load(resfn)
    resDECj = res
    }

#######################################################
# PDF plots
#######################################################
pdffn = "Psychotria_DEC_vs_DEC+J_M0_unconstrained_v1.pdf"
pdf(pdffn, width=8.5, height=11)

#######################################################
# Plot ancestral states - DEC
#######################################################
analysis_titletxt ="BioGeoBEARS DEC on Psychotria M0_unconstrained"

# Setup
results_object = resDEC
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

#######################################################
# Plot ancestral states - DECJ
#######################################################
analysis_titletxt ="BioGeoBEARS DEC+J on Psychotria M0_unconstrained"

# Setup
results_object = resDECj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

dev.off()  # Turn off PDF
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr) # Plot it

#######################################################
# DIVALIKE AND DIVALIKE+J ANALYSIS
#######################################################

# NOTE: The BioGeoBEARS "DIVA" model is not identical with 
# Ronquist (1997)'s parsimony DIVA. It is a likelihood
# interpretation of DIVA, constructed by modelling DIVA's
# processes the way DEC does, but only allowing the 
# processes DIVA allows (widespread vicariance: yes; subset
# sympatry: no; see Ronquist & Sanmartin 2011, Figure 4).
#
# I thus now call the model "DIVALIKE", and you should also. ;-)
# 

#######################################################
# Run DIVALIKE
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1

BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)

# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE

# Set up DIVALIKE model
# Remove subset-sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "2-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "ysv*1/2"

# Allow classic, widespread vicariance; all events equiprobable
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","init"] = 0.5
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","est"] = 0.5

# No jump dispersal/founder-event speciation
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

runslow = TRUE
resfn = "Psychotria_DIVALIKE_M0_unconstrained_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    resDIVALIKE = res
    } else {
    # Loads to "res"
    load(resfn)
    resDIVALIKE = res
    }

#######################################################
# Run DIVALIKE+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1

BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)

# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE

# Set up DIVALIKE+J model
# Get the ML parameter values from the 2-parameter nested model
# (this will ensure that the 3-parameter model always does at least as good)
dstart = resDIVALIKE$outputs@params_table["d","est"]
estart = resDIVALIKE$outputs@params_table["e","est"]
jstart = 0.0001

# Input starting values for d, e
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","init"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","est"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","init"] = estart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","est"] = estart

# Remove subset-sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "2-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "ysv*1/2"

# Allow classic, widespread vicariance; all events equiprobable
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","init"] = 0.5
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","est"] = 0.5

# Add jump dispersal/founder-event speciation
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = jstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = jstart

# Under DIVALIKE+J, the max of "j" should be 2, not 3 (as is default in DEC+J)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","min"] = 0.00001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","max"] = 1.99999

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

resfn = "Psychotria_DIVALIKE+J_M0_unconstrained_v1.Rdata"
runslow = TRUE
if (runslow)
    {
    #sourceall("/Dropbox/_njm/__packages/BioGeoBEARS_setup/")

    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)

    resDIVALIKEj = res
    } else {
    # Loads to "res"
    load(resfn)
    resDIVALIKEj = res
    }

pdffn = "Psychotria_DIVALIKE_vs_DIVALIKE+J_M0_unconstrained_v1.pdf"
pdf(pdffn, width=8.5, height=11)

#######################################################
# Plot ancestral states - DIVALIKE
#######################################################
analysis_titletxt ="BioGeoBEARS DIVALIKE on Psychotria M0_unconstrained"

# Setup
results_object = resDIVALIKE
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

#######################################################
# Plot ancestral states - DIVALIKE+J
#######################################################
analysis_titletxt ="BioGeoBEARS DIVALIKE+J on Psychotria M0_unconstrained"

# Setup
results_object = resDIVALIKEj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

#######################################################
# BAYAREALIKE AND BAYAREALIKE+J ANALYSIS
#######################################################

# NOTE: As with DIVA, the BioGeoBEARS BayArea-like model is 
# not identical with the full Bayesian model implemented 
# in the "BayArea" program of Landis et al. (2013). 
#
# Instead, this is a simplified likelihood interpretation
# of the model.  Basically, in BayArea and BioGeoBEARS-BAYAREALIKE, 
# "d" and "e" work like they do in the DEC model of Lagrange 
# (and BioGeoBEARS), and then BayArea's cladogenesis assumption
# (which is that nothing in particular happens at cladogenesis) is 
# replicated by BioGeoBEARS.
#
# This leaves out 3 important things that are in BayArea:
# 1. Distance dependence (you can add this with a distances 
#    matrix + the "x" parameter in BioGeoBEARS, however)
# 2. A correction for disallowing "e" events that drive
#    a species extinct (a null geographic range)
# 3. The neat Bayesian sampling of histories, which allows
#    analyses on large numbers of areas.
#
# The main purpose of having a "BAYAREALIKE" model is 
# to test the importance of the cladogenesis model on 
# particular datasets. Does it help or hurt the data 
# likelihood if there is no special cladogenesis process?
# 
# I thus now call the model "BAYAREALIKE", and you should also. ;-)
# 

#######################################################
# Run BAYAREALIKE
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1

BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)

# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE

# Set up BAYAREALIKE model
# No subset sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

# No vicariance
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","est"] = 0.0

# No jump dispersal/founder-event speciation
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01

# Adjust linkage between parameters
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "1-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/1"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "1-j"

# Only sympatric/range-copying (y) events allowed, and with 
# exact copying (both descendants always the same size as the ancestor)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","init"] = 0.9999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","est"] = 0.9999

# Check the inputs
check_BioGeoBEARS_run(BioGeoBEARS_run_object)

runslow = TRUE
resfn = "Psychotria_BAYAREALIKE_M0_unconstrained_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    resBAYAREALIKE = res
    } else {
    # Loads to "res"
    load(resfn)
    resBAYAREALIKE = res
    }

#######################################################
# Run BAYAREALIKE+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
#BioGeoBEARS_run_object$distsfn = "distances_matrix.txt"
# See notes on the distances model on PhyloWiki's BioGeoBEARS updates page.

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1

BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# This function loads the dispersal multiplier matrix etc. from the text files into the model object. Required for these to work!
# (It also runs some checks on these inputs for certain errors.)
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)

# Divide the tree up by timeperiods/strata (uncomment this for stratified analysis)
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
# The stratified tree is described in this table:
#BioGeoBEARS_run_object$master_table

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE

# Set up BAYAREALIKE+J model
# Get the ML parameter values from the 2-parameter nested model
# (this will ensure that the 3-parameter model always does at least as good)
dstart = resBAYAREALIKE$outputs@params_table["d","est"]
estart = resBAYAREALIKE$outputs@params_table["e","est"]
jstart = 0.0001

# Input starting values for d, e
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","init"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","est"] = dstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","init"] = estart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","est"] = estart

# No subset sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

# No vicariance
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","est"] = 0.0

# *DO* allow jump dispersal/founder-event speciation (set the starting value close to 0)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = jstart
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = jstart

# Under BAYAREALIKE+J, the max of "j" should be 1, not 3 (as is default in DEC+J) or 2 (as in DIVALIKE+J)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","max"] = 0.99999

# Adjust linkage between parameters
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "1-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/1"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "1-j"

# Only sympatric/range-copying (y) events allowed, and with 
# exact copying (both descendants always the same size as the ancestor)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","init"] = 0.9999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","est"] = 0.9999

# NOTE (NJM, 2014-04): BAYAREALIKE+J seems to crash on some computers, usually Windows 
# machines. I can't replicate this on my Mac machines, but it is almost certainly
# just some precision under-run issue, when optim/optimx tries some parameter value 
# just below zero.  The "min" and "max" options on each parameter are supposed to
# prevent this, but apparently optim/optimx sometimes go slightly beyond 
# these limits.  Anyway, if you get a crash, try raising "min" and lowering "max" 
# slightly for each parameter:
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","min"] = 0.0000001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["d","max"] = 4.9999999

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","min"] = 0.0000001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["e","max"] = 4.9999999

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","min"] = 0.00001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","max"] = 0.99999

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

resfn = "Psychotria_BAYAREALIKE+J_M0_unconstrained_v1.Rdata"
runslow = TRUE
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)

    resBAYAREALIKEj = res
    } else {
    # Loads to "res"
    load(resfn)
    resBAYAREALIKEj = res
    }

pdffn = "Psychotria_BAYAREALIKE_vs_BAYAREALIKE+J_M0_unconstrained_v1.pdf"
pdf(pdffn, width=8.5, height=11)

#######################################################
# Plot ancestral states - BAYAREALIKE
#######################################################
analysis_titletxt ="BioGeoBEARS BAYAREALIKE on Psychotria M0_unconstrained"

# Setup
results_object = resBAYAREALIKE
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

#######################################################
# Plot ancestral states - BAYAREALIKE+J
#######################################################
analysis_titletxt ="BioGeoBEARS BAYAREALIKE+J on Psychotria M0_unconstrained"

# Setup
results_object = resBAYAREALIKEj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

#########################################################################
#########################################################################
#########################################################################
#########################################################################
# 
# CALCULATE SUMMARY STATISTICS TO COMPARE
# DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, BAYAREALIKE+J
# 
#########################################################################
#########################################################################
#########################################################################
#########################################################################

#########################################################################
#########################################################################
# REQUIRED READING:
#
# Practical advice / notes / basic principles on statistical model 
#    comparison in general, and in BioGeoBEARS:
# http://phylo.wikidot.com/advice-on-statistical-model-comparison-in-biogeobears
#########################################################################
#########################################################################

# Set up empty tables to hold the statistical results
restable = NULL
teststable = NULL

#######################################################
# Statistics -- DEC vs. DEC+J
#######################################################
# We have to extract the log-likelihood differently, depending on the 
# version of optim/optimx
LnL_2 = get_LnL_from_BioGeoBEARS_results_object(resDEC)
LnL_1 = get_LnL_from_BioGeoBEARS_results_object(resDECj)

numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats

# DEC, null model for Likelihood Ratio Test (LRT)
res2 = extract_params_from_BioGeoBEARS_results_object(results_object=resDEC, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
# DEC+J, alternative model for Likelihood Ratio Test (LRT)
res1 = extract_params_from_BioGeoBEARS_results_object(results_object=resDECj, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)

# The null hypothesis for a Likelihood Ratio Test (LRT) is that two models
# confer the same likelihood on the data. See: Brian O'Meara's webpage:
# http://www.brianomeara.info/tutorials/aic
# ...for an intro to LRT, AIC, and AICc

rbind(res2, res1)
tmp_tests = conditional_format_table(stats)

restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)

#######################################################
# Statistics -- DIVALIKE vs. DIVALIKE+J
#######################################################
# We have to extract the log-likelihood differently, depending on the 
# version of optim/optimx
LnL_2 = get_LnL_from_BioGeoBEARS_results_object(resDIVALIKE)
LnL_1 = get_LnL_from_BioGeoBEARS_results_object(resDIVALIKEj)

numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats

# DIVALIKE, null model for Likelihood Ratio Test (LRT)
res2 = extract_params_from_BioGeoBEARS_results_object(results_object=resDIVALIKE, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
# DIVALIKE+J, alternative model for Likelihood Ratio Test (LRT)
res1 = extract_params_from_BioGeoBEARS_results_object(results_object=resDIVALIKEj, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)

rbind(res2, res1)
conditional_format_table(stats)

tmp_tests = conditional_format_table(stats)

restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)

#######################################################
# Statistics -- BAYAREALIKE vs. BAYAREALIKE+J
#######################################################
# We have to extract the log-likelihood differently, depending on the 
# version of optim/optimx
LnL_2 = get_LnL_from_BioGeoBEARS_results_object(resBAYAREALIKE)
LnL_1 = get_LnL_from_BioGeoBEARS_results_object(resBAYAREALIKEj)

numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats

# BAYAREALIKE, null model for Likelihood Ratio Test (LRT)
res2 = extract_params_from_BioGeoBEARS_results_object(results_object=resBAYAREALIKE, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)
# BAYAREALIKE+J, alternative model for Likelihood Ratio Test (LRT)
res1 = extract_params_from_BioGeoBEARS_results_object(results_object=resBAYAREALIKEj, returnwhat="table", addl_params=c("j"), paramsstr_digits=4)

rbind(res2, res1)
conditional_format_table(stats)

tmp_tests = conditional_format_table(stats)

restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)

#########################################################################
# RESULTS: DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, BAYAREALIKE+J
#########################################################################
teststable$alt = c("DEC+J", "DIVALIKE+J", "BAYAREALIKE+J")
teststable$null = c("DEC", "DIVALIKE", "BAYAREALIKE")
row.names(restable) = c("DEC", "DEC+J", "DIVALIKE", "DIVALIKE+J", "BAYAREALIKE", "BAYAREALIKE+J")

# Look at the results!!
restable
teststable

#######################################################
# Save the results tables for later -- check for e.g.
# convergence issues
#######################################################

# Loads to "restable"
save(restable, file="restable_v1.Rdata")
load(file="restable_v1.Rdata")

# Loads to "teststable"
save(teststable, file="teststable_v1.Rdata")
load(file="teststable_v1.Rdata")

#######################################################
# Your results should be:
#######################################################

# > restable
# 
#                 LnL numparams            d            e         j
# DEC           -34.5         2 3.504546e-02 2.835632e-02 0.0000000
# DEC+J         -20.9         3 1.000000e-12 1.000000e-12 0.1142811
# DIVALIKE      -33.1         2 4.474416e-02 1.000000e-12 0.0000000
# DIVALIKE+J    -21.1         3 2.001000e-09 1.000000e-12 0.1157199
# BAYAREALIKE   -40.3         2 1.738085e-02 3.040188e-01 0.0000000
# BAYAREALIKE+J -21.6         3 1.000000e-12 1.000000e-12 0.1081158

#######################################################
# The p-value of the LRT (Likelihood Ratio Test) tells you whether or not you can reject the
# null hypothesis that DEC and DEC+J confer equal likelihoods on the data
#
# AIC and AICc model weights are also shown, giving a sense of the relative probability of the two models.
#
# (One could easily do model weights between all 6 models, but this is not done here; s
#  see Brian O'Meara's AIC webpage: http://www.brianomeara.info/tutorials/aic )
#######################################################

# > teststable
# 
#              alt        null LnLalt LnLnull DFalt DFnull DF Dstatistic    pval        test       tail  AIC1  AIC2 AICwt1  AICwt2 AICweight_ratio_model1 AICweight_ratio_model2
# 1          DEC+J         DEC -20.95  -34.54     3      2  1      27.19 1.8e-07 chi-squared one-tailed  47.9 73.08   1.00 3.4e-06                 294893                3.4e-06
# 11    DIVALIKE+J    DIVALIKE -21.09  -33.15     3      2  1      24.13 9.0e-07 chi-squared one-tailed 48.17  70.3   1.00 1.6e-05                  63797                1.6e-05
# 12 BAYAREALIKE+J BAYAREALIKE -21.09  -33.15     3      2  1      24.13 9.0e-07 chi-squared one-tailed 48.17  70.3   1.00 1.6e-05                  63797                1.6e-05
#

Old updates

(link to this section)

Manual construction of BioGeoBEARS legends (update February 2014)

(link to this section)

I have added a script on how to build your own legend after running the example script. This could then be manually added to a graphic e.g. in Powerpoint, Photoshop or some other graphics editor. here is the link.

BioGeoBEARS Google Group (update October 1, 2013)

(link to this section)

There was substantial interest in the listserv, so I've created one, and added those who were interested who I remembered. It is invite-only so that we don't get robospam. If I forgot you, or you are interested, email me at gro.soibmin|ekztam#gro.soibmin|ekztam to be added. Once you are added, you can post by emailing: biogeobearsATgooglegroups.com .

The listserv is publicly-readable at: https://groups.google.com/forum/#!forum/biogeobears

The introductory welcome message is: https://groups.google.com/forum/#!topic/biogeobears/H6PpmRJNxeg

Interested in listserv? (update September 10, 2013)

(link to this section)

Even while I've been moving, I have gotten many emails about BioGeoBEARS. I am happy to answer them individually, but it might benefit people if there was an email listserv. Email me (gro.soibmin|ekztam#gro.soibmin|ekztam) if you would like me to put you on such a listserv. It would be used for (1) BioGeoBEARS updates and announcements and (2) answers to questions.

Solved issue with "optimx" dependency (resolved Jan. 2014)

(link to this section)

The optimx package, which BioGeoBEARS uses, has been updated and this breaks a final step in the BioGeoBEARS function that assembles the output from optimx. I will fix this when I am able (I have just moved from California to a postdoc at NIMBioS, see National Institute of Mathematical and Biological Synthesis, www.nimbios.org). For now, if you hit an error message (not just a warning) after the maximum likelihood search has run, do the following:

  • 2. Exit R and restart R.
  • 3. Type:
remove.packages("optimx")

…to get rid of the old version of the package.

  • 4. Either (a) use the clickable menus to install the 2012 version of optimx (e.g. in the Macintosh R.app, do: Menu bar —> Packages & Data —> Package Installer —> change "CRAN (binaries)" to "Local Source Package" (or to "Local Package Directory" if you unzipped it) —> click "Install" —> navigate to the 2012 optimx download and click "Open"

…or…

(b) Via command-line, try something like:

install.packages(pkgs="PUT_THE_FULL_PATH_AND_FILENAME_OF_2012_OPTIMX_HERE", repos=NULL, type='source')

library(optimx)
  • 5. You can check which version of optimx you have library-ed by typing:
sessionInfo()

For now: optimx_2012.04.01=GOOD, optimx_2013.08.06=BAD.

The fix is not hard, I will do it once I move into my new office.

UPDATE 7/31/2013: Version 0.2.1 is on CRAN, version 0.2.2-2 is available on github or at the bottom of this page.

UPDATE 7/31/2013: I have updated the Psychotria example to show these models: DEC, DEC+J, DIVA-like, DIVA-like+J, BAYAREA-like, and BAYAREA-like+J.

UPDATE 6/23/2013: I have just uploaded my Evolution 2013 talk, which was Sunday, 11:45 am at the Ernst Mayr symposium. The PDF can be found at the bottom of the file. I have also uploaded a zipped source package for BioGeoBEARS. This is more up-to-date than github. It is the same as what I just submitted to CRAN as 0.1.1.

See also by iEvoBio Lightning Talk, 1:30 pm Tuesday.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License