This is an FAQ on the trait-based dispersal model in BioGeoBEARS. Various papers using this model are in various stages of revision, but here I will accumulate answers to common questions, and dangers to look out for.
How complex should I make the model?
How many traits and areas can be used to have practical computation speed?
BioGeoBEARS (and several other programs) are strongly limited by the size of the state space. The "state space" is the number of possible states that the data might take. In biogeography, the size of the state space = the number of ranges.
Crucial points:
- A key part of the likelihood calculations is "exponentiating the rate matrix"
- The rate matrix has (number_of_states) rows and (number_of_states) columns.
- Exponentiating the rate matrix thousands of times (required for every branch of the tree, on every calculation of the likelihood during ML search) gets slow at 1500 states (days to compute), very slow at 2000 states (weeks to compute), and impractically slow above about 2500 states (your computer will never finish)
- If max_areas = number of areas, the number of ranges equals 2^(number of areas). Thus, 2 areas = 4 states, 3 areas = 8 states, etc. If max_areas is less than the number of areas, the number of ranges goes up more slowly, see the BioGeoBEARS function numstates_from_numareas.
- Adding a binary trait is like adding another range, i.e. it doubles the size of the state space
- Adding a trinary trait (3 states) triples the size of the state space if a species can have only one of the states, and it multiplies the state space by 8 if the species can take anywhere from 0-3 of the trait states
- So, if you have 10 areas, that is 1024 biogeographic states. If you have a 3-state trait (e.g. small, medium, large seeds), that increases the size of the state space to 1024x3 = 3072 states. This is basically too big to do practical ML inference (perhaps with supercomputers and massive parallelization, but even this strategy will soon break down).
How many free parameters can my dataset size support?
A key tradeoff in BioGeoBEARS models (and all other model-based inference) is:
- the number of free parameters (free parameters are the parameters you try to infer from the data, e.g. via Maximum Likelihood)
- the number of data (the dataset size; here, this is the number of species in your phylogeny (or number of species-like populations)
Rules of thumb:
- It is impossible to do accurate inference if you have more free parameters than data (in ML; in a Bayesian framework, it is possible, but you have to rely on informative priors).
- Is is extremely dubious and basically un-publishable to make confident inferences if there are only a few data per free parameter. If phrased as experimental or only suggestive, you might get away with it, but you should be nervous and express the dangers in the Discussion section of the paper.
- Ideally you have at least 5-10 data per free parameter.
- You should always use AICc rather than AIC if you have small data and/or many parameters.
- ML optimizers are increasing likely to fail at successful optimization as the number of free parameters increases. So, more free parameters means
- use more thorough optimizers (e.g. GenSA rather than optim or optimx)
- do more elaborate manual checks (start optimization runs from different starting points)
- use rerun_optimization and/or manual functions you write, to re-run the optimizations from several different starting points
- read and UNDERSTAND the pitfalls of optimizers, and danger signs to look for: Issues with Maximum Likelihood (ML) optimization
Number of free parameters (model complexity)
Even in early discussions of the trait-based dispersal model with just a few researchers, I have already noticed a common, but dangerous, tendency. The tendency is this: "I have measured X traits or Y different states for my characters of interest, let's build a model that checks them all for their influence on dispersal probability!"
Before embarking down this path, the first thing to think is: HOLD ON! Read the above advice on dataset size versus number of free parameters.
Then, consider the following:
For a binary trait (two-state character), there are 6 free parameters:
d = rate of range expansion (adding an area, anagenetic change along branches, e.g. A->AB)
e = rate of range contraction (subtracting an area, anagenetic change along branches, e.g. AB->A)
j = relative weight of jump dispersal at cladogenesis (e.g. ancestor just before speciation=AB -> descendents just after speciation left=AB, right=C)
t12 = transition rate from trait state 1 to trait state 2
t21 = transition rate from trait state 2 to trait state 1
m1 = multiplier on the base d and base j, for the hypothesized most-dispersive trait (which should be trait state 1). m1 is fixed to equal 1.
m2 = multiplier on the base d and base j, for the hypothesized less-dispersive trait (which should be trait state 2)
For a "trinary" trait (three-state character), there are 11 free parameters:
d = rate of range expansion (adding an area, anagenetic change along branches, e.g. A->AB)
e = rate of range contraction (subtracting an area, anagenetic change along branches, e.g. AB->A)
j = relative weight of jump dispersal at cladogenesis (e.g. ancestor just before speciation=AB -> descendents just after speciation left=AB, right=C)
t12 = transition rate from trait state 1 to trait state 2
t21 = transition rate from trait state 2 to trait state 1
t13 = transition rate from trait state 1 to trait state 3
t31 = transition rate from trait state 3 to trait state 1
t23 = transition rate from trait state 2 to trait state 3
t32 = transition rate from trait state 3 to trait state 2
m1 = multiplier on the base d and base j, for the hypothesized most-dispersive trait (which should be trait state 1). m1 is fixed to equal 1.
m2 = multiplier on the base d and base j, for the hypothesized medium-dispersive trait (which should be trait state 2)
m3 = multiplier on the base d and base j, for the hypothesized low-dispersive trait (which should be trait state 3)
For a "quaternary" trait (four-state character), there are 18 free parameters:
d = rate of range expansion (adding an area, anagenetic change along branches, e.g. A->AB)
e = rate of range contraction (subtracting an area, anagenetic change along branches, e.g. AB->A)
j = relative weight of jump dispersal at cladogenesis (e.g. ancestor just before speciation=AB -> descendents just after speciation left=AB, right=C)
t12 = transition rate from trait state 1 to trait state 2
t21 = transition rate from trait state 2 to trait state 1
t13 = transition rate from trait state 1 to trait state 3
t31 = transition rate from trait state 3 to trait state 1
t14 = transition rate from trait state 1 to trait state 4
t41 = transition rate from trait state 4 to trait state 1
t23 = transition rate from trait state 2 to trait state 3
t32 = transition rate from trait state 3 to trait state 2
t24 = transition rate from trait state 2 to trait state 4
t42 = transition rate from trait state 4 to trait state 2
t34 = transition rate from trait state 3 to trait state 4
t43 = transition rate from trait state 4 to trait state 3
m1 = multiplier on the base d and base j, for the hypothesized most-dispersive trait (which should be trait state 1). m1 is fixed to equal 1.
m2 = multiplier on the base d and base j, for the hypothesized medium-high dispersive trait (which should be trait state 2)
m3 = multiplier on the base d and base j, for the hypothesized medium-low dispersive trait (which should be trait state 3)
m4 = multiplier on the base d and base j, for the hypothesized low-dispersive trait (which should be trait state 4)
(Note, in the above, m1 is not a free parameter, is a fixed parameter. The parameter m1 is fixed to the value 1. But I include it in the list so that you don't wonder where m1 is.)
As you increase the number of free parameters, all of the dangers increase, and you stress your dataset size more and more.
In addition, more complex models have many more possible sub-models (where some of the parameters have been fixed to certain values), and many researchers have the instinct "we might as well run every imaginable model, because it is easier to write a for-loop than it is think carefully about what we are doing." This is a murky issue, but it raises the danger of "model degrees of freedom", e.g. if you run enough models you will eventually find something that looks interesting but actually is just an accident.
It is worth carefully considering the advice of Burnham and Anderson (2002 and many other publications) on statistical model comparison. They recommend:
- “a considerable amount of careful, a priori thinking in arriving at a set of candidate models,” “keeping the number of candidate models small,” and using researcher experience and prior knowledge to ensure the models that are used are “well-founded” (Burnham and Anderson 2002, pp. 17, 18).
- Researchers should avoid untargeted “fishing expeditions,” where dozens or hundreds of traits are tested for their “influence” on a parameter (dispersal rate or any other parameter of interest). Fishing expeditions have a high chance of false positives, because so many tests are being done, something will eventually be found just by accident (Anderson 2008).