BioGeoBears

Update March 2014: Simulation results

DEC, DEC+J, and most methods in use in historical biogeography implicitly assume that the observed tree is the complete tree, and that lineage-level speciation and extinction are independent of the range evolution process*. Models that correct for these factors are SSE (state-dependent speciation-extinction models), such as BiSSE, GeoSSE, and ClaSSE, but the SSE methods have not yet been widely applied in biogeography, probably because of the large number of parameters they involve (which will increase dramatically as the number of areas and the number of ranges increase), and the typical small size of biogeographic datasets.

Nevertheless, it is interesting to ask whether or not SSE-type processes, as well as missing speciation & extinction events in the observed tree, might cause a bias in inference under DEC or DEC+J, or a problem in distinguishing between these models.

It is also interesting to ask under what values of d, e, and j parameter inference, model choice, and ancestral range inference are reliable.

Therefore, I performed forward simulations under the following macroevolutionary models:

1. Yule, birthrate=0.3
2. BD, birthrate=0.3, deathrate=0.1
3. BD, birthrate=0.3, deathrate=0.3 (this one dies out a lot)
4. SSE, birthrate=0.3, deathrate=0, prob(birth)~rangesize^1
5. SSE, birthrate=0.3, deathrate=0.1, prob(birth)~rangesize^1, prob(death)~rangesize^-1
6. SSE, birthrate=0.3, deathrate=0.3, prob(birth)~rangesize^1, prob(death)~rangesize^-1

The latter three models make speciation more probable in lineages with large geographic ranges, and make extinction less probable in lineages with large geographic ranges.

For each of these macroevolutionary models, geographic range evolved simultaneously with the lineage birth-death process, under a variety of values of d, e and j:

d = 0, 0.03, 0.15
e = 0, 0.03, 0.15
j = 0, 0.02, 0.1, 0.3 (these are per-event weights, not rates)

Parameter combinations were excluded when: deathrate > birthrate, and when e > d. This left 138 parameter combinations. For each of these combinations, the phylogeny+geographical history was simulated 100 times, with each simulation running until the clade had 50 living species.

The complete trees (including extinct lineages) and complete simulated geographic histories were saved, and then the trees were pruned to be "observed" trees. Inference was then conducted under DEC and DEC+J.

I am writing up these results for a paper, but the short version is:

1. DEC and DEC+J are highly distinguishable at moderate values** of d and e (0, 0.03) which are often observed in empirical inference. At very high values (d, e=0.15, i.e. on average a d or e event on every branch), this breaks down somewhat, especially for low j, although high j can often still be detected. However, even DEC inference breaks down at high d and e, as would any phylogenetic method where the rates of change are high per branch.

2. Inference of e is always poor, under both DEC+J and DEC (as shown in the original DEC papers)

3. Inference of d and inference of j do not interfere with each other under reasonable parameter values. They are distinguishable. If anything, DEC infers a d too high when the generating model is DEC+J.

4. The effects of BD or SSE on inference appear moderate in most situations.

PDF Files of simulation results

A. DEC_DECJ_inf_boxplots_ALL.pdf — Simulations when starting range is a single area

B. startABCD_DEC_DECJ_inf_boxplots_ALL.pdf — Simulations when the starting range is all areas

C. simtree_stats_boxplots_ALL2.pdf — Tree statistics, range sizes, number of events, etc., under the simulations. These are in the same order as in A, and correspond to the same parameter combinations.

Notes

Note * — DEC & DEC+J models do have the process where, if e is positive, range can evolve to 0 size through a range-contraction event. This is usually not inferred, because usually all species in an analysis are living and occupy a nonzero range. Whether or not a continuous-time range-contraction process really causes extinction in real life is debatable. In my simulations, extinction occurs according to the background deathrate (mu), and e can cause additional extinction through the range-contraction process.

So, a "high" rate of d or e is e.g. 50% of the speciation rate. A "low" rate might be 1/10 the speciation rate.

Note ** — The values of d and e only have meaning in relation to the speciation rate. For Hawaiian Psychotria this is ~0.33 (geiger package, birthdeath function). To compare the simulation results to your own data, you would want to estimate your speciation rate, and scale everything mentally. E.g., if your speciation rate is 0.6, your equivalent values of d and e would be doubled from the simulations.

Abbreviations in plots:

birthrate = lambda
deathrate = mu
exponent on rangesize multiplier for speciation = alpha
exponent on rangesize multiplier for extinction = omega

Update February 2014: Manual construction of BioGeoBEARS legends

I have added a script on how to build your own legend at the end of this PhyloWiki page: here is the link.

Updates February 2014: Fixing states at multiple nodes.

I have added the capability to fix the state likelihoods at more than one node, for both regular and stratified analyses. Instead of just listing 1 node in fixnode, you can do multiple nodes, e.g. BioGeoBEARS_run_object$fixnode = c(20,21), and BioGeoBEARS_run_object$fixlikes can now take a matrix with a number of rows equal to the length of fixnode. Various other minor improvements in fixnodes analysis were also added. (Update March 2014: I have now put the fixnodes into the uppass calculations as well, although this only matters in rare situations.) To enable the improvements, source the latest code via:

library(BioGeoBEARS)
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_classes_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stratified_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_models_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_univ_model_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_plots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_readwrite_v1.R")
source("http://phylo.wikidot.com/local--files/biogeobears/calc_loglike_sp_v01.R")
calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations

Updates January 2014: Fixes for optimx2012/optimx2013/optim issues; bug fix in uppass calculations

During the January 7 workshop at the International Biogeography Society Meeting in Canberra, Australia, some Windows users with new versions of R were not able to install the 2012 version of optimx. Switching to the R base function optim() worked, but caused a problem with producing the graphics.

I also noticed a bug in the uppass calculations, which are important for calculating ancestral state probabilities. The downpass calculations dominate the ancestral state probabilities for most nodes, however, especially at the branches attaching to the root, the bug fix may make a difference. This may resolve certain cases where "contradictory" node states and corner states were observed (other cases may be probability ties — check the pie charts — or due to the fact that ancestral states under the global ML model are not the same thing as a single coherent history; see joint ancestral states, Felsenstein 2004, for that sort of estimation method).

I have now fixed all of these problems in the revised example script, posted below. (Update March 2014: I have also fixed the uppass calculations in stratified analyses.) The script includes two source() commands which contained revised functions. Therefore, for the moment, begin all BioGeoBEARS analyses with:

library(optimx)  # (either 2012 or 2013 version, as of January 2014)
library(FD)        # for FD::maxent() (make sure this is up-to-date)
library(snow)     # (if you want to use multicore functionality; prob. better than library(parallel))
library(BioGeoBEARS)
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_classes_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stratified_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_models_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_univ_model_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_plots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_readwrite_v1.R")
source("http://phylo.wikidot.com/local--files/biogeobears/calc_loglike_sp_v01.R")
calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations

As optimx is currently a dependency of BioGeoBEARS, you will still need some version of optimx to get BioGeoBEARS to load. I will try to make it optional in the next big update.

I have tested the new example script using 2012 optimx, 2013 optimx, and optim(), and all of them work. If you have problems, email me or the BioGeoBEARS google group!

Update December 2, 2013: Citations for BioGeoBEARS and associated models

I have posted the approved version of my Ph.D. in "Files", below (scroll to the bottom of the page and click "Files"). Here is the reference, for citation purposes:

  • Matzke, Nicholas Joseph (2013). Probabilistic Historical Biogeography: New Models for Founder-Event Speciation, Imperfect Detection, and Fossils Allow Improved Accuracy and Model-Testing. Ph.D. thesis, Department Integrative Biology and Designated Emphasis in Computational and Genomic Biology, University of California, Berkeley. Pages 1-240. August 2013. Available at: http://phylo.wikidot.com/local--files/biogeobears/Matzke_PhD_FINAL_v5_w_refs.pdf

Citation advice for using BioGeoBEARS: Chapters 1, 2, and 3, as well as the abstract, have been submitted to journals; their current progress is listed in the citations below. These versions are "shorter and sweeter" than the Ph.D., and therefore contain less introductory/background material, but you are also welcome to cite the Ph.D. if you like:

Matzke, Nicholas Joseph (2013). Probabilistic Historical Biogeography: New Models for Founder-Event Speciation, Imperfect Detection, and Fossils Allow Improved Accuracy and Model-Testing. Ph.D. thesis, Department Integrative Biology and Designated Emphasis in Computational and Genomic Biology, University of California, Berkeley. Pages 1-240. August 2013. Available at: http://phylo.wikidot.com/local--files/biogeobears/Matzke_PhD_FINAL_v5_w_refs.pdf

At the moment I have not thought about putting these on arXiv q-bio or bioRxiv, mostly because I'm uncertain what will annoy which journals. I could probably be convinced to investigate further, if someone needs something public to cite.

Which journals you decide to cite depends on which you used, and what your journal's rules are for citing dissertations, articles in review, etc. Ideally, I would like to see people cite at least the Methods in Ecology and Evolution article for the R package, the Systematic Biology article for the DEC+J model, and the Journal of Biogeography article for the likelihood versions of DIVA, DIVA+J, BAYAREA, and BAYAREA+J, but it is hard to predict when each article will be accepted. In the meantime, users who need published works to cite can the thesis itself or the short pieces in Frontiers of Biogeography.

Apart from journals, it has become common practice to cite R packages that are important in an analysis. Every R package has a CITATION file. Those citations can be found with e.g. "citation("BioGeoBEARS")", and are given below.

The articles and their statuses are:

  • Matzke, Nicholas J. (2013). Thesis abstract: Probabilistic Historical Biogeography: New Models for Founder-Event Speciation, Imperfect Detection, and Fossils Allow Improved Accuracy and Model-Testing. Frontiers of Biogeography. 5(4), 242-248. (Scholar | Journal)
  • Matzke, Nicholas J. (2014). "BioGeoBEARS: an R package for model testing and ancestral state estimation in historical biogeography." Methods in Ecology and Evolution, revising & improving package documentation for resubmission.
  • Matzke, Nicholas J. (2014). "Formal Model Testing of the Dispersal-Extinction-Cladogenesis (DEC) Model Reveals that Founder-event Speciation is a Dominant Process Structuring the Biogeography of Island Clades." Systematic Biology, in review.
  • Matzke, Nicholas J. (2014). "Model selection reveals differences in cladogenesis processes operating in island versus continental clades." Journal of Biogeography, in revision.

Each article has various supplemental materials etc. These articles have now been submitted, but different journals have different rules about prepublication so I'm not putting them online for now. I am happy to share on a one-on-one basis, so email me at gro.soibmin|ekztam#gro.soibmin|ekztam if you would like to see those. Thanks! Nick

Citation of R packages

Apart from journals, every R package has a CITATION file. Those citations can be found with "citation("BioGeoBEARS")", and are given below. To get the current version and date, type e.g. "library(help = BioGeoBEARS)".

Matzke, Nicholas J. (2013). BioGeoBEARS: BioGeography with Bayesian (and Likelihood) Evolutionary Analysis in R Scripts. R package, version 0.2.1, published July 27, 2013 at: http://CRAN.R-project.org/package=BioGeoBEARS

Matzke, Nicholas J., and Sidje, Roger B. (2013). ]rexpokit: R wrappers for EXPOKIT. R package, version 0.24.2, published July 15, 2013 at: http://CRAN.R-project.org/package=rexpokit

Matzke, Nicholas J. (2013). cladoRcpp: C++ implementations of phylogenetic calculations. R package, version 0.14.2, published July 15, 2013 at: http://CRAN.R-project.org/package=cladoRcpp

Update November 12, 2013: BioGeoBEARS in one slide

Here is an attempt to represent the BioGeoBEARS supermodel in one slide, comparing it to other historical biogeography methods in use.

NOTE: On smaller/laptop screens, some people aren't seeing the Parameters column. Zoom out if necessary (Apple-minus on a Mac, or View menu) to see the whole graphic. Or go to the graphic by itself.

BioGeoBEARS_supermodel.png

Figure: The processes assumed by different historical biogeography methods. Each of these processes is controlled by the specified parameter(s) in the BioGeoBEARS supermodel, allowing them to be turned on or off, or estimated from the data. Note that whether or not the data support a particular free parameter is an empirical question that should be tested with model choice procedures. Note also that this graphic deals only with the range-changing processes assumed by the different methods. BioGeoBEARS does not attempt to replicate e.g. the parsimony aspect of DIVA, just the processes allowed by DIVA (the BioGeoBEARS "DIVA" model could be called "DIVALIKE" to emphasize that it is a likelihood implementation of the processes assumed by DIVA). Similarly, BioGeoBEARS does not yet implement the "SSE" (state-based speciation and estimation rates) features of the GeoSSE model (Goldberg et al. 2011) of diversity. The ClaSSE model (Goldberg & Igić, 2012) can in theory use a parameter to represent the probability of each possible combination of ancestor range, left descendant range, and right descendant range. In that sense ClaSSE is the ultimate supermodel, although users would have to develop their own parameterizations to produce a reasonable biogeographic model, and the number of parameters inflates dramatically with number of areas — on defaults, 9 areas means 29-1=511 possible ranges, and this means 511x511x511 = 133,432,831 possible combinations of ancestor/left descendant/right descendant. The cladoRcpp R package, a dependency of BioGeoBEARS, is designed to efficiently calculate probabilities for these combinations, under the implemented biogeography models.

Update November 8, 2013: Using time-stratification, dispersal multiplier matrices, distance matrices, areas allowed matrices, and area-of-areas matrices

I introduce how to run analyses including time-stratification, dispersal multiplier matrices, distance matrices, areas allowed matrices, and area-of-areas matrices [in this post on the BioGeoBEARS Google Group].

Update October 1, 2013: Listserv is up!

There was substantial interest in the listserv, so I've created one, and added those who were interested who I remembered. It is invite-only so that we don't get robospam. If I forgot you, or you are interested, email me at gro.soibmin|ekztam#gro.soibmin|ekztam to be added. Once you are added, you can post by emailing: biogeobearsATgooglegroups.com .

The listserv is publicly-readable at: https://groups.google.com/forum/#!forum/biogeobears

The introductory welcome message is: https://groups.google.com/forum/#!topic/biogeobears/H6PpmRJNxeg

Update September 10, 2013: Interested in listserv?

Even while I've been moving, I have gotten many emails about BioGeoBEARS. I am happy to answer them individually, but it might benefit people if there was an email listserv. Email me (gro.soibmin|ekztam#gro.soibmin|ekztam) if you would like me to put you on such a listserv. It would be used for (1) BioGeoBEARS updates and announcements and (2) answers to questions.

Update August 24, 2013: issue with "optimx" dependency (resolved Jan. 2014)

The optimx package, which BioGeoBEARS uses, has been updated and this breaks a final step in the BioGeoBEARS function that assembles the output from optimx. I will fix this when I am able (I have just moved from California to a postdoc at NIMBioS, see National Institute of Mathematical and Biological Synthesis, www.nimbios.org). For now, if you hit an error message (not just a warning) after the maximum likelihood search has run, do the following:

  • 2. Exit R and restart R.
  • 3. Type:
remove.packages("optimx")

…to get rid of the old version of the package.

  • 4. Either (a) use the clickable menus to install the 2012 version of optimx (e.g. in the Macintosh R.app, do: Menu bar —> Packages & Data —> Package Installer —> change "CRAN (binaries)" to "Local Source Package" (or to "Local Package Directory" if you unzipped it) —> click "Install" —> navigate to the 2012 optimx download and click "Open"

…or…

(b) Via command-line, try something like:

install.packages(pkgs="PUT_THE_FULL_PATH_AND_FILENAME_OF_2012_OPTIMX_HERE", repos=NULL, type='source')

library(optimx)
  • 5. You can check which version of optimx you have library-ed by typing:
sessionInfo()

For now: optimx_2012.04.01=GOOD, optimx_2013.08.06=BAD.

The fix is not hard, I will do it once I move into my new office.

Introduction

BioGeoBEARS is an R package, authored by Nicholas J. Matzke, that is designed to perform inference of biogeographic history on phylogenies, and also model testing and model choice of the many different possible models of how biogeography may evolve on a phylogeny (dispersal, vicariance, founder-event speciation, DEC, DIVA, BAYAREA, etc.)

"BioGeoBEARS" is short for "BioGeography with Bayesian Evolutionary Analysis in R Scripts", if you were wondering. It implements the LAGRANGE (Ree & Smith 2008) model (2 free parameters) as well as models with fewer or more free parameters. Standard model-testing procedures may then be applied.

The default implementation is actually ML (maximum likelihood), for comparison with LAGRANGE, but there is a Bayesian option making use of LaplacesDemon — contact me at ude.yelekreb|ekztam#ude.yelekreb|ekztam, if you can't figure it out from the functions.

See also my IBS poster.

Author: Nicholas J. Matzke

Installation

To install and run BioGeoBEARS, and the required supporting packages rexpokit and cladoRcpp, use install.packages, or the links below. R can be freely downloaded from: R-Project.org

rexpokit: http://cran.r-project.org/web/packages/rexpokit/index.html

cladoRcpp: http://cran.r-project.org/web/packages/cladoRcpp/index.html

BioGeoBEARS: http://cran.r-project.org/web/packages/BioGeoBEARS/index.html

A bleeding-edge version of BioGeoBEARS may be available for download on github at: https://github.com/nmatzke/BioGeoBEARS. It is easier to download it from within R (see the install_github command).

Updates

UPDATE 7/31/2013: Version 0.2.1 is on CRAN, version 0.2.2-2 is available on github or at the bottom of this page.

UPDATE 7/31/2013: I have updated the Psychotria example to show these models: DEC, DEC+J, DIVA-like, DIVA-like+J, BAYAREA-like, and BAYAREA-like+J.

UPDATE 6/23/2013: I have just uploaded my Evolution 2013 talk, which was Sunday, 11:45 am at the Ernst Mayr symposium. The PDF can be found at the bottom of the file. I have also uploaded a zipped source package for BioGeoBEARS. This is more up-to-date than github. It is the same as what I just submitted to CRAN as 0.1.1.

See also by iEvoBio Lightning Talk, 1:30 pm Tuesday.

SCRIPT: EXAMPLE THAT SHOULD RUN OUT OF THE BOX: Hawaiian Psychotria (revised and improved, 2014-03-21)

#######################################################
# This is an introductory example script for the 
# R package "BioGeoBEARS" by Nick Matzke
# 
# I am happy to answer questions at matzke@nimbios.org, but
# I am more happy to answer questions on the 
# BioGeoBEARS google group
#
# The package is designed for ML and Bayesian inference
# of 
# 
# (a) ancestral geographic ranges, and 
# 
# (b) perhaps more importantly, models for the 
#     evolution of geographic range across a phylogeny.
#
# The example below implements and compares:
# 
# (1) The standard 2-parameter DEC model implemented in 
#     the program LAGRANGE (Ree & Smith 2008); users will
#     notice that the ML parameter inference and log-
#     likelihoods are identical
#
# (2) A DEC+J model implemented in BioGeoBEARS, wherein
#     a third parameter, j, is added, representing the 
#     relative per-event weight of founder-event / jump
#     speciation events at cladogenesis events.  The 
#     higher j is, the more probability these events have,
#     and the less probability the standard LAGRANGE
#     cladogenesis events have.
#
# (3) Some standard model-testing (LRT and AIC) is 
#     implemented at the end so that users may compare models
#
# (4) The script does similar tests of a DIVA-like model (Ronquist 1997)
#     and a BAYAREA-like model (Landis, Matzke, Moore, & Huelsenbeck, 2013)
# 
#######################################################

#######################################################
# Installing BioGeoBEARS
#######################################################
# Uncomment this command to get everything
# Please use the "0-cloud" R repository at "http://cran.rstudio.com" as it is 
# the only one that keeps download statistics
#######################################################
#
#
# # Install BioGeoBEARS from CRAN 0-cloud:
# install.packages("BioGeoBEARS", dependencies=TRUE, repos="http://cran.rstudio.com")
#
#######################################################

#######################################################
# SETUP -- libraries/BioGeoBEARS updates
#######################################################

# Load the package (after installation, see above).
library(optimx)         # You need to have some version of optimx available
                        # as it is a BioGeoBEARS dependency; however, if you
                        # don't want to use optimx, and use optim() (from R core) 
                        # you can set:
                        # BioGeoBEARS_run_object$use_optimx = FALSE
                        # ...everything should work either way -- NJM 2014-01-08
library(FD)       # for FD::maxent() (make sure this is up-to-date)
library(snow)     # (if you want to use multicore functionality; some systems/R versions prefer library(parallel), try either)
library(parallel)
library(BioGeoBEARS)

########################################################
# TO GET THE OPTIMX/OPTIM FIX, AND THE UPPASS FIX, 
# SOURCE THE REVISED FUNCTIONS WITH THESE COMMANDS
########################################################
library(BioGeoBEARS)
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_classes_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_stratified_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_models_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_univ_model_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_plots_v1.R")
source("http://phylo.wdfiles.com/local--files/biogeobears/BioGeoBEARS_readwrite_v1.R")
source("http://phylo.wikidot.com/local--files/biogeobears/calc_loglike_sp_v01.R")
calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations

#######################################################
# SETUP: YOUR WORKING DIRECTORY
#######################################################
# You will need to set your working directory to match your local system
# You can find the input files at:
extdata_dir = np(system.file("extdata", package="BioGeoBEARS"))
extdata_dir
list.files(extdata_dir)

# Note these very handy functions!
# Command "setwd(x)" sets your working directory
# Command "getwd()" gets your working directory and tells you what it is.
# Command "list.files()" lists the files in your working directory
# To get help on any command, use "?".  E.g., "?list.files"

# Set your working directory for output files
# default here is your home directory ("~")
# Change this as you like
wd = np("~")
setwd(wd)

# Double-check your working directory with getwd()
getwd()

#######################################################
# SETUP: YOUR TREE FILE AND GEOGRAPHY FILE
#######################################################
# Example files are given below. To run your own data,
# make the below lines point to your own files, e.g.
# trfn = "/mydata/frogs/frogBGB/tree.newick"
# geogfn = "/mydata/frogs/frogBGB/geog.data"

#######################################################
# Phylogeny file
# Notes: 
# 1. Must be binary/bifurcating: no polytomies
# 2. No negative branchlengths (e.g. BEAST MCC consensus trees sometimes have negative branchlengths)
# 3. Be careful of very short branches, as BioGeoBEARS will interpret ultrashort branches as direct ancestors
# 4. You can use non-ultrametric trees, but BioGeoBEARS will interpret any tips significantly below the 
#    top of the tree as fossils!  This is only a good idea if you actually do have fossils in your tree,
#    as in e.g. Wood, Matzke et al. (2013), Systematic Biology.
# 5. The default settings of BioGeoBEARS make sense for trees where the branchlengths are in units of 
#    millions of years, and the tree is 1-1000 units tall. If you have a tree with a total height of
#    e.g. 0.00001, you will need to adjust e.g. the max values of d and e, or (simpler) multiply all
#    your branchlengths to get them into reasonable units.
# 6. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
#######################################################
# This is the example Newick file for Hawaiian Psychotria
# (from Ree & Smith 2008)
# "trfn" = "tree file name"
trfn = np(paste(addslash(extdata_dir), "Psychotria_5.2.newick", sep=""))

# Look at the raw Newick file:
moref(trfn)

# Look at your phylogeny:
tr = read.tree(trfn)
tr
plot(tr)
title("Example Psychotria phylogeny from Ree & Smith (2008)")
axisPhylo() # plots timescale

#######################################################
# Geography file
# Notes:
# 1. This is a PHLYIP-formatted file. This means that in the 
#    first line, 
#    - the 1st number equals the number of rows (species)
#    - the 2nd number equals the number of columns (number of areas)
# 2. This is the same format used for C++ LAGRANGE geography files.
# 3. All names in the geography file must match names in the phylogeny file.
# 4. DON'T USE SPACES IN SPECIES NAMES, USE E.G. "_"
# 5. Operational taxonomic units (OTUs) should ideally be phylogenetic lineages, 
#    i.e. genetically isolated populations.  These may or may not be identical 
#    with species.  You would NOT want to just use specimens, as each specimen 
#    automatically can only live in 1 area, which will typically favor DEC+J 
#    models.  This is fine if the species/lineages really do live in single areas,
#    but you wouldn't want to assume this without thinking about it at least. 
#    In summary, you should collapse multiple specimens into species/lineages if 
#    data indicates they are the same genetic population.
######################################################

# This is the example geography file for Hawaiian Psychotria
# (from Ree & Smith 2008)
geogfn = np(paste(addslash(extdata_dir), "Psychotria_geog.data", sep=""))

# Look at the raw geography text file:
moref(geogfn)

# Look at your geographic range data:
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
tipranges

# Set the maximum number of areas any species may occupy; this cannot be larger 
# than the number of areas you set up, but it can be smaller.
max_range_size = 4

####################################################
####################################################
# KEY HINT: The number of states (= number of different possible geographic ranges)
# depends on (a) the number of areas and (b) max_range_size.
# If you have more than about 500-600 states, the calculations will get REALLY slow,
# since the program has to exponentiate a matrix of e.g. 600x600.  Often the computer
# will just sit there and crunch, and never get through the calculation of the first
# likelihood.
# 
# (this is also what is usually happening when LAGRANGE hangs: you have too many states!)
#
# To check the number of states for a given number of ranges, try:
numstates_from_numareas(numareas=4, maxareas=4, include_null_range=TRUE)
numstates_from_numareas(numareas=4, maxareas=4, include_null_range=FALSE)
numstates_from_numareas(numareas=4, maxareas=3, include_null_range=TRUE)
numstates_from_numareas(numareas=4, maxareas=2, include_null_range=TRUE)

# Large numbers of areas have problems:
numstates_from_numareas(numareas=10, maxareas=10, include_null_range=TRUE)

# ...unless you limit the max_range_size:
numstates_from_numareas(numareas=10, maxareas=2, include_null_range=TRUE)
####################################################
####################################################

# Set up empty tables to hold the statistical results
restable = NULL
teststable = NULL

#######################################################
# DEC AND DEC+J ANALYSIS
#######################################################

#######################################################
# Run DEC
#######################################################

BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$speedup=TRUE        # seems to work OK
BioGeoBEARS_run_object$calc_ancprobs=TRUE        # get ancestral states from optim run

# Set up a time-stratified analysis 
# (un-comment to use; see example files in extdata_dir, 
#  and BioGeoBEARS google group posts for further hints)
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"

# Input the maximum range size
BioGeoBEARS_run_object$max_range_size = max_range_size

# Multicore processing if desired
BioGeoBEARS_run_object$num_cores_to_use=1
# (use more cores to speed it up; this requires
# library(parallel), which is default on Macs in
# R 3.0+, but apparently still has to be typed
# on Windows machines. Note: apparently parallel
# works on Mac command-line R, but not R.app.
# BioGeoBEARS checks for this and resets to 1
# core with R.app)

# Sparse matrix exponentiation is an option for huge numbers of ranges/states (600+)
# I have experimented with sparse matrix exponentiation in EXPOKIT/rexpokit,
# but the results are imprecise and so I haven't explored it further.
# In a Bayesian analysis, it might work OK, but the ML point estimates are
# not identical.
# Also, I have not implemented all functions to work with force_sparse=TRUE.
# Volunteers are welcome to work on it!!
BioGeoBEARS_run_object$force_sparse=FALSE

# Give BioGeoBEARS the location of the geography text file
BioGeoBEARS_run_object$geogfn = geogfn

# Give BioGeoBEARS the location of the phylogeny Newick file
BioGeoBEARS_run_object$trfn = trfn

# Read in the input files. Read the error messages if you get them!
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by strata (stratified analysis)
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)

# Good default settings to get ancestral states
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object$master_table

# Set up DEC model
# (nothing to do; defaults)

# Look at the BioGeoBEARS_run_object; it's just a list of settings etc.
BioGeoBEARS_run_object

# This contains the model object
BioGeoBEARS_run_object$BioGeoBEARS_model_object

# This table contains the parameters of the model 
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

# Run this to check inputs. Read the error messages if you get them!
check_BioGeoBEARS_run(BioGeoBEARS_run_object)

# For a slow analysis, run once, then set runslow=FALSE to just 
# load the saved result.
runslow = TRUE
resfn = "Psychotria_DEC_M0_unconstrained_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    resDEC = res
    } else {
    # Loads to "res"
    load(resfn)
    resDEC = res
    }

#######################################################
# Run DEC+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1
BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE        # seems to work OK
BioGeoBEARS_run_object$calc_ancprobs=TRUE        # get ancestral states from optim run

# Run to check inputs
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by strata
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)

BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object$master_table

# Set up DEC+J model
# Add j as a free parameter
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

resfn = "Psychotria_DEC+J_M0_unconstrained_v1.Rdata"
runslow = TRUE
if (runslow)
    {
    #sourceall("/Dropbox/_njm/__packages/BioGeoBEARS_setup/")

    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)

    resDECj = res
    } else {
    # Loads to "res"
    load(resfn)
    resDECj = res
    }

#######################################################
# PDF plots
#######################################################
pdffn = "Psychotria_DEC_vs_DEC+J_M0_unconstrained_v1.pdf"
pdf(pdffn, width=8.5, height=11)

#######################################################
# Plot ancestral states - DEC
#######################################################
analysis_titletxt ="BioGeoBEARS DEC on Psychotria M0_unconstrained"

# Setup
results_object = resDEC
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

#######################################################
# Plot ancestral states - DECJ
#######################################################
analysis_titletxt ="BioGeoBEARS DEC+J on Psychotria M0_unconstrained"

# Setup
results_object = resDECj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

dev.off()  # Turn off PDF
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr) # Plot it

#######################################################
# Statistics
#######################################################
# We have to extract the log-likelihood differently, depending on the 
# version of optim/optimx
if (BioGeoBEARS_run_object$use_optimx == TRUE)
    {
    # Using optimx() results
    if (packageVersion("optimx") < 2013)
        {
        # optimx 2012
        LnL_2 = as.numeric(resDEC$optim_result$fvalues)
        LnL_1 = as.numeric(resDECj$optim_result$fvalues)
        } else {
        # optimx 2013
        LnL_2 = as.numeric(resDEC$optim_result$value)
        LnL_1 = as.numeric(resDECj$optim_result$value)
        } # end optimx 2012 vs. 2013
    } else {
    # Using optim() results
    LnL_2 = as.numeric(resDEC$optim_result$value)
    LnL_1 = as.numeric(resDECj$optim_result$value)
    } # end optim vs. optimx

numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats

res2    # DEC, null model for Likelihood Ratio Test (LRT)
res1    # DEC+J, alternative model for Likelihood Ratio Test (LRT)

# The null hypothesis for a Likelihood Ratio Test (LRT) is that two models
# confer the same likelihood on the data. See: Brian O'Meara's webpage:
# http://www.brianomeara.info/tutorials/aic
# ...for an intro to LRT, AIC, and AICc

rbind(res2, res1)
tmp_tests = conditional_format_table(stats)

restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)

#######################################################
# DIVALIKE AND DIVALIKE+J ANALYSIS
#######################################################

# NOTE: The BioGeoBEARS "DIVA" model is not identical with 
# Ronquist (1997)'s parsimony DIVA. It is a likelihood
# interpretation of DIVA, constructed by modelling DIVA's
# processes the way DEC does, but only allowing the 
# processes DIVA allows (widespread vicariance: yes; subset
# sympatry: no; see Ronquist & Sanmartin 2011, Figure 4).
#
# I thus now call the model "DIVALIKE", and you should also. ;-)
# 

#######################################################
# Run DIVALIKE
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1
BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE        # seems to work OK
BioGeoBEARS_run_object$calc_ancprobs=TRUE        # get ancestral states from optim run

# Run to check inputs
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by strata
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)

BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object$master_table

# Set up DIVALIKE model
# Remove subset-sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "2-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "ysv*1/2"

# Allow classic, widespread vicariance; all events equiprobable
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","init"] = 0.5
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","est"] = 0.5

# No jump dispersal/founder-event speciation
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

runslow = TRUE
resfn = "Psychotria_DIVALIKE_M0_unconstrained_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    resDIVALIKE = res
    } else {
    # Loads to "res"
    load(resfn)
    resDIVALIKE = res
    }

#######################################################
# Run DIVALIKE+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1
BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE        # seems to work OK
BioGeoBEARS_run_object$calc_ancprobs=TRUE        # get ancestral states from optim run

# Run to check inputs
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by strata
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)

BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object$master_table

# Set up DIVALIKE+J model
# Remove subset-sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "2-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "ysv*1/2"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "ysv*1/2"

# Allow classic, widespread vicariance; all events equiprobable
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","init"] = 0.5
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01v","est"] = 0.5

# Add jump dispersal/founder-event speciation
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

resfn = "Psychotria_DIVALIKE+J_M0_unconstrained_v1.Rdata"
runslow = TRUE
if (runslow)
    {
    #sourceall("/Dropbox/_njm/__packages/BioGeoBEARS_setup/")

    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)

    resDIVALIKEj = res
    } else {
    # Loads to "res"
    load(resfn)
    resDIVALIKEj = res
    }

pdffn = "Psychotria_DIVALIKE_vs_DIVALIKE+J_M0_unconstrained_v1.pdf"
pdf(pdffn, width=8.5, height=11)

#######################################################
# Plot ancestral states - DIVALIKE
#######################################################
analysis_titletxt ="BioGeoBEARS DIVALIKE on Psychotria M0_unconstrained"

# Setup
results_object = resDIVALIKE
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

#######################################################
# Plot ancestral states - DIVALIKE+J
#######################################################
analysis_titletxt ="BioGeoBEARS DIVALIKE+J on Psychotria M0_unconstrained"

# Setup
results_object = resDIVALIKEj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

#######################################################
# Stats
#######################################################
# We have to extract the log-likelihood differently, depending on the 
# version of optim/optimx
if (BioGeoBEARS_run_object$use_optimx == TRUE)
    {
    # Using optimx() results
    if (packageVersion("optimx") < 2013)
        {
        # optimx 2012
        LnL_2 = as.numeric(resDIVALIKE$optim_result$fvalues)
        LnL_1 = as.numeric(resDIVALIKEj$optim_result$fvalues)
        } else {
        # optimx 2013
        LnL_2 = as.numeric(resDIVALIKE$optim_result$value)
        LnL_1 = as.numeric(resDIVALIKEj$optim_result$value)
        } # end optimx 2012 vs. 2013
    } else {
    # Using optim() results
    LnL_2 = as.numeric(resDIVALIKE$optim_result$value)
    LnL_1 = as.numeric(resDIVALIKEj$optim_result$value)
    } # end optim vs. optimx

numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats

res2    # DIVALIKE basic results
res1    # DIVALIKE+J basic results

rbind(res2, res1)
conditional_format_table(stats)

tmp_tests = conditional_format_table(stats)

restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)

#######################################################
# BAYAREALIKE AND BAYAREALIKE+J ANALYSIS
#######################################################

# NOTE: As with DIVA, the BioGeoBEARS BayArea-like model is 
# not identical with the full Bayesian model implemented 
# in the "BayArea" program of Landis et al. (2013). 
#
# Instead, this is a simplified likelihood interpretation
# of the model.  Basically, "d" and "e" work like they 
# do in the DEC model of Lagrange (and BioGeoBEARS), and 
# then BayArea's cladogenesis assumption (which is that 
# nothing in particular happens at cladogenesis) is 
# replicated by BioGeoBEARS.
#
# This leaves out 3 important things that are in BayArea:
# 1. Distance dependence (you can add this with a distances 
#    matrix in BioGeoBEARS, however)
# 2. A correction for disallowing "e" events that drive
#    a species extinct (a null geographic range)
# 3. The neat Bayesian sampling of histories, which allows
#    analyses on large numbers of areas.
#
# The main purpose of having a "BAYAREALIKE" model is 
# to test the importance of the cladogenesis model on 
# particular datasets. Does it help or hurt the data 
# likelihood if there is no special cladogenesis process?
# 
# I thus now call the model "BAYAREALIKE", and you should also. ;-)
# 

#######################################################
# Run BAYAREALIKE
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1
BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE        # seems to work OK
BioGeoBEARS_run_object$calc_ancprobs=TRUE        # get ancestral states from optim run

# Run to check inputs
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by strata
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)

BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object$master_table

# Set up BAYAREALIKE model
# No subset sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

# No vicariance
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","est"] = 0.0

# No jump dispersal/founder-event speciation
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
# BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01

# Adjust linkage between parameters
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "1-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/1"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "1-j"

# Only sympatric/range-copying (y) events allowed, and with 
# exact copying (both descendants always the same size as the ancestor)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","init"] = 0.9999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","est"] = 0.9999

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

runslow = TRUE
resfn = "Psychotria_BAYAREALIKE_M0_unconstrained_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    resBAYAREALIKE = res
    } else {
    # Loads to "res"
    load(resfn)
    resBAYAREALIKE = res
    }

#######################################################
# Run BAYAREALIKE+J
#######################################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size

# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn = "manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"

BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$num_cores_to_use=1
BioGeoBEARS_run_object$force_sparse=FALSE    # sparse=FALSE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object$speedup=TRUE          # seems to work OK
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

# Run to check inputs
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
# Divide the tree up by strata
#BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)

BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object$master_table

# Set up BAYAREALIKE+J model
# No subset sympatry
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["s","est"] = 0.0

# No vicariance
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","init"] = 0.0
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["v","est"] = 0.0

# *DO* allow jump dispersal/founder-event speciation
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] = "free"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] = 0.01
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","est"] = 0.01

# Adjust linkage between parameters
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ysv","type"] = "1-j"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["ys","type"] = "ysv*1/1"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["y","type"] = "1-j"

# Only sympatric/range-copying (y) events allowed, and with 
# exact copying (both descendants always the same size as the ancestor)
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","type"] = "fixed"
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","init"] = 0.9999
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["mx01y","est"] = 0.9999

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

resfn = "Psychotria_BAYAREALIKE+J_M0_unconstrained_v1.Rdata"
runslow = TRUE
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)

    resBAYAREALIKEj = res
    } else {
    # Loads to "res"
    load(resfn)
    resBAYAREALIKEj = res
    }

pdffn = "Psychotria_BAYAREALIKE_vs_BAYAREALIKE+J_M0_unconstrained_v1.pdf"
pdf(pdffn, width=8.5, height=11)

#######################################################
# Plot ancestral states - BAYAREALIKE
#######################################################
analysis_titletxt ="BioGeoBEARS BAYAREALIKE on Psychotria M0_unconstrained"

# Setup
results_object = resBAYAREALIKE
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

#######################################################
# Plot ancestral states - BAYAREALIKE+J
#######################################################
analysis_titletxt ="BioGeoBEARS BAYAREALIKE+J on Psychotria M0_unconstrained"

# Setup
results_object = resBAYAREALIKEj
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res1 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

#######################################################
# Stats
#######################################################
# We have to extract the log-likelihood differently, depending on the 
# version of optim/optimx
if (BioGeoBEARS_run_object$use_optimx == TRUE)
    {
    # Using optimx() results
    if (packageVersion("optimx") < 2013)
        {
        # optimx 2012
        LnL_2 = as.numeric(resBAYAREALIKE$optim_result$fvalues)
        LnL_1 = as.numeric(resBAYAREALIKEj$optim_result$fvalues)
        } else {
        # optimx 2013
        LnL_2 = as.numeric(resBAYAREALIKE$optim_result$value)
        LnL_1 = as.numeric(resBAYAREALIKEj$optim_result$value)
        } # end optimx 2012 vs. 2013
    } else {
    # Using optim() results
    LnL_2 = as.numeric(resBAYAREALIKE$optim_result$value)
    LnL_1 = as.numeric(resBAYAREALIKEj$optim_result$value)
    } # end optim vs. optimx

numparams1 = 3
numparams2 = 2
stats = AICstats_2models(LnL_1, LnL_2, numparams1, numparams2)
stats

res2    # BAYAREALIKE results
res1    # BAYAREALIKE+J results

rbind(res2, res1)
conditional_format_table(stats)

tmp_tests = conditional_format_table(stats)

restable = rbind(restable, res2, res1)
teststable = rbind(teststable, tmp_tests)

#########################################################################
# RESULTS: DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, BAYAREALIKE+J
#########################################################################
teststable$alt = c("DEC+J", "DIVALIKE+J", "BAYAREALIKE+J")
teststable$null = c("DEC", "DIVALIKE", "BAYAREALIKE")
row.names(restable) = c("DEC", "DEC+J", "DIVALIKE", "DIVALIKE+J", "BAYAREALIKE", "BAYAREALIKE+J")

# Look at the results!!
restable
teststable

#######################################################
# Your results should be:
#######################################################

# > restable
# 
#                 LnL numparams            d            e         j
# DEC           -34.5         2 3.504546e-02 2.835632e-02 0.0000000
# DEC+J         -20.9         3 1.000000e-12 1.000000e-12 0.1142811
# DIVALIKE      -33.1         2 4.474416e-02 1.000000e-12 0.0000000
# DIVALIKE+J    -21.1         3 2.001000e-09 1.000000e-12 0.1157199
# BAYAREALIKE   -40.3         2 1.738085e-02 3.040188e-01 0.0000000
# BAYAREALIKE+J -21.6         3 1.000000e-12 1.000000e-12 0.1081158

#######################################################
# The p-value of the LRT (Likelihood Ratio Test) tells you whether or not you can reject the
# null hypothesis that DEC and DEC+J confer equal likelihoods on the data
#
# AIC and AIC model weights are also shown, giving a sense of the relative probability of the two models.
#
# (One could easily do model weights between all 6 models, but this is not done here; see Brian O'Meara's AIC webpage)
#######################################################

# > teststable
# 
#              alt        null LnLalt LnLnull DFalt DFnull DF Dstatistic    pval        test       tail  AIC1  AIC2 AICwt1  AICwt2 AICweight_ratio_model1 AICweight_ratio_model2
# 1          DEC+J         DEC -20.95  -34.54     3      2  1      27.19 1.8e-07 chi-squared one-tailed  47.9 73.08   1.00 3.4e-06                 294893                3.4e-06
# 11    DIVALIKE+J    DIVALIKE -21.09  -33.15     3      2  1      24.13 9.0e-07 chi-squared one-tailed 48.17  70.3   1.00 1.6e-05                  63797                1.6e-05
# 12 BAYAREALIKE+J BAYAREALIKE -21.09  -33.15     3      2  1      24.13 9.0e-07 chi-squared one-tailed 48.17  70.3   1.00 1.6e-05                  63797                1.6e-05
#

SCRIPT: BioGeoBEARS legends

Based on this post to the BioGeoBEARS google group.

It is difficult to make legends work automatically (e.g. via plotlegends=TRUE), because often there are so many states/ranges, and phylogenies can have many different sizes and shapes.

I originally wanted to add the legend to people's tree plots, but that is very difficult to make work. So I think it's easier for people to just build the legend separately.

Here is a script that does legends / keys for the default Hawaii dataset, or you can paste in the latter part of the script to your own analysis script.

#######################################################
# Example key table, and legend
#######################################################

#######################################################
# Make a table that is the abbreviations key
#######################################################
# Here, the file with the "key" is a tab-delimited text file -- 
# change to your file location if you don't want Hawaii
keyfn = "http://phylo.wdfiles.com/local--files/biogeobears/Hawaii_abbreviations_table.txt"

keydf = read.table(keyfn, header=TRUE, sep="\t", stringsAsFactors=FALSE, fill=TRUE)
keydf

keydf2 = keydf[,c("ab1", "desc")]
names(keydf2) = c("Abbr", "Description")
keydf2

library(gridExtra)    # for grid.table() function
grid.table(keydf2, show.rownames=FALSE)

# Defaults, if you want to use the Hawaii names/abbreviations from above
max_range_size = 4
include_null_range = TRUE
areanames = keydf$ab1
areanames

# You could paste the below directly into your script, if it is
# set up like in the PhyloWiki example script

#######################################################
# Plot legend for ALL states/ranges (there may be a 
# ton, and getting them all to display is hard)
#######################################################
pdffn = "colors_legend_all_v1.pdf"
pdf(pdffn, width=6, height=6)

areanames = names(tipranges@df)
areanames

include_null_range = TRUE

states_list_0based_index = rcpp_areas_list_to_states_list(areas=areanames, maxareas=max_range_size, include_null_range=include_null_range)

statenames = areas_list_to_states_list_new(areas=areanames, maxareas=max_range_size, include_null_range=include_null_range, split_ABC=FALSE)
statenames

relprobs_matrix = resDEC$ML_marginal_prob_each_state_at_branch_top_AT_node
MLprobs = get_ML_probs(relprobs_matrix)
MLstates = get_ML_states_from_relprobs(relprobs_matrix, statenames, returnwhat="states", if_ties="takefirst")

colors_matrix = get_colors_for_numareas(length(areanames))
colors_list_for_states = mix_colors_for_states(colors_matrix, states_list_0based_index, exclude_null=(include_null_range==FALSE))
colors_list_for_states

possible_ranges_list_txt = areas_list_to_states_list_new(areas=areanames,  maxareas=max_range_size, split_ABC=FALSE, include_null_range=include_null_range)
cols_byNode = rangestxt_to_colors(possible_ranges_list_txt, colors_list_for_states, MLstates)

legend_ncol=NULL
legend_cex=1.5
colors_legend(possible_ranges_list_txt, colors_list_for_states, legend_ncol=legend_ncol, legend_cex=legend_cex)

dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

#######################################################
# Plot legend for SOME states/ranges 
# (especially since the widespread ranges are just
#  combinations of the primary colors used for 
#  single-area ranges)
#######################################################

pdffn = "colors_legend_some_v1.pdf"
pdf(pdffn, width=6, height=6)

# Subset to just some ranges (since there are sooo many combinations)
states_to_put_in_legend = c(1,2,3,4,5)
colors_list_for_states_subset = colors_list_for_states[states_to_put_in_legend]
possible_ranges_list_txt_subset = possible_ranges_list_txt[states_to_put_in_legend]

legend_ncol=NULL
legend_cex=1.5
colors_legend(possible_ranges_list_txt_subset, colors_list_for_states_subset, legend_ncol=legend_ncol, legend_cex=legend_cex)

dev.off()
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr)

Note: datasets & scripts for reviewers (temporary link)

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License