In this page, I will accumulate mistakes that I spot in the literature, especially pertaining to likelihood, the Likelihood Ratio Test, AIC, and optimization problems.
In most cases, these mistakes are not fatal to interpretation, but they do indicate worrisome unfamiliarity with the basics of statistical model choice. The mistakes would be avoided by consultation of Burnham & Anderson's Model selection and multimodel inference: a practical information-theoretic approach, which has 30,000+ citations on Google Scholar.
A shorter introduction to the basics of statistical model choice may be found on my PhyloWiki pages on BioGeoBEARS Mistakes To Avoid and Advice On Statistical Model Comparison In BioGeoBEARS.
I may write a review on the topic at some point for a journal where these mistakes seem particularly frequent.
Examples
LRT / AIC
Viruel, J., Segarra-Moragues, J. G., Raz, L., Forest, F., Wilkin, P., Sanmartín, I. and Catalán, P. (2015), Late Cretaceous–Early Eocene origin of yams (Dioscorea, Dioscoreaceae) in the Laurasian Palaearctic and their subsequent Oligocene–Miocene diversification. Journal of Biogeography, published online December 18, 2015.
http://dx.doi.org/10.1111/jbi.12678
http://onlinelibrary.wiley.com/doi/10.1111/jbi.12678/abstract
The two biogeographical DEC models with fossil AA constraints (M0 and M1) gave similar results (Fig. 4, Appendices S1, S3 Fig. S3), but reconstructed distinct biogeographical scenarios for the most recent common ancestor (MRCA) of Dioscorea (Node 132). Because the stratified M1 model showed a better fit to the data than the unconstrained M0 model (−ln likelihood 281.4 versus 316.9, respectively; likelihood ratio test, P = 0.001), we will refer to the results from this model hereafter (but see comments below and in the Discussion). The global estimated dispersal rate for the M1 model (dis: 0.0136947) was three times higher than the estimated extinction rate (ext: 0.00465982).
Mistake: The Likelihood Ratio Test can only be used to compare two models when one model is "nested" inside the other — and "nesting" means, specifically and exactly, that the simpler model is achieved by fixing the value of a parameter (or parameters) that are free in the more complex model. In other words, the simpler model is a special case of the more complex model.
In the example above, a time-stratified biogeographical model is being compared to an unconstrained biogeographical model. Both models have the same number of free parameters (namely, two: d and e, representing "dispersal"/range expansion and "extinction"/local extirpation).
The models differ in having different fixed multipliers on dispersal probabilities between regions. In the "unconstrained" model, the dispersal multipliers between all regions are set to 1 during all time periods (obviously, this does not need to be explicitly set in the program, because multiplying by 1 changes nothing). In the time-stratified model, the multipliers are set to other fixed values. Comparing two models with different fixed parameters is comparing two models that are non-nested. The models can be compared with AIC, AICc, etc., or even by just eyeballing the log-likelihood values — but the Likelihood Ratio Test is not appropriate.
This is because the whole point of the LRT is to ask whether the likelihood of the data improves significantly beyond what would be expected by chance when adding a free parameter, since adding a free parameter will always improve the likelihood at least slightly, and can never decrease the likelihood. (If you do see a decrease when adding a free parameter, this means either that your program has a math mistake, or, more likely, that your maximum-likelihood algorithm has failed to find the maximum likelihood. See ML optimization routines and their pitfalls.)
(Of course, the AIC etc. do not yield p-values, but if your model weights are strongly in favor of a more complex model, this is similar evidence that a model is a better explanation of the data.)
Mistake: Using LRT on non-nested models
These papers use Likelihood-Ratio Tests to compare models with different manual dispersal modifier matrices. Noben et al. (2017) get the idea from Sánchez-Ramírez et al. (2015), whom they cite.
Noben, Sarah; Kessler, Michael; Quandt, Dietmar; Weigand, Anna; Wicke, Susann; Krug, Michael; Lehnert, Marcus (2017). Biogeography of the Gondwanan tree fern family Dicksoniaceae—A tale of vicariance, dispersal and extinction. Journal of Biogeography, 44(11), 2648–2659. Published online 11 July 2017.
URL: http://onlinelibrary.wiley.com/doi/10.1111/jbi.13056/full
DOI: http://dx.doi.org/10.1111/jbi.13056
Sánchez-Ramírez, S., Tulloss, R. E., Amalfi, M., & Moncalvo, J.-M. (2015). Palaeotropical origins, boreotropical distribution and increased rates of diversification in a clade of edible ectomycorrhizal mushrooms (Amanita section Caesareae). Journal of Biogeography, 42, 351–363. First published: 27 August 2014
URL: http://onlinelibrary.wiley.com/doi/10.1111/jbi.12402/full
DOI: http://dx.doi.org/10.1111/jbi.12402
This is incorrect, because changing the manual dispersal modifier matrices is changing fixed parameters, not adding a free parameter. Only the latter creates a nested model (the model with the added free parameter contains the model without the free parameter nested within it — see PhyloWiki: Nesting of Models).
These papers assume one degree of freedom in their use of the chi-squared test on the likelihood ratio, but the one-degree-of-freedom is supposed to mean there is 1 added free parameter in the more complex model. The "correct" degrees of freedom would be zero, but then the chi-squared test won't calculate.
Now, the difference between 0 and 1 is not huge, and the degree of freedom essentially constitutes a likelihood penalty of a few units for more complex models. So, if the same dataset has very different likelihoods under two different models, this difference in likelihood will dominate the test result, and it won't be horribly misleading. But, technically, the P-values produced are not meaningful.
A correct strategy would have been to use AIC or AICc to compare the non-nested models, following Burnham and Anderson (2002): http://phylo.wikidot.com/advice-on-statistical-model-comparison-in-biogeobears#AIC_Bible . This involves giving up on producing P-values, however, which still seems to be outside the mindspace of many researchers, even though Burnham and Anderson (2002) has tens of thousands of citations and its methods are well-accepted.
Note: If all models have the same number of free parameters, as in the papers above, one can just use the maximized log-likelihoods to calculate model weights etc., the results will be identical with what you would get with AIC or AICc.
Note: The above highlights what I have noticed to be a common confusion, which is that people doing biogeography analyses confuse free parameters with fixed modifications to things like the dispersal multiplier matrices. To hit it again:
- Free parameters are unknown before the program looks at the data. They could take a wide range of values. The values of the parameters are inferred via Maximum Likelihood or some other estimation algorithm (Bayesian MCMC etc.).
- Modifications to manual dispersal modifiers are specified by the user ahead of time. They are thus fixed throughout the subsequent analysis. Therefore they are a different thing than free parameters.
Solutions: +x, +w model variants
All of that said, while the above problem is just a common misunderstanding of statistics, it isn't the most important problem with the methods used above. The real problem with the above studies, and dozens of others that use manual-dispersal-multiplier matrices, is: Where did the values of the dispersal multipliers come from? The answer always basically boils down to intuition: continents A and B are touching, so they get a multiplier of 1, A and C are near to each other, so they get a multiplier of 0.5, and A and D are far apart, so they get 0.01. Researchers might well agree that the order of these multipliers is correct, but why are these multipliers better than, say, choosing 1, 0.1, 0.05, and 0.0001?
For the above reasons, I have long advocated for using something objective, like geographical distance, to determine the relative dispersal probabilities between different areas, and then (crucially), estimating the effect of this predictor using something like the "+x" model variant. This strategy is proposed by Van Dam and Matzke (2016), J. Biogeography.
A similar strategy is to use a "+w" model on the user-determined manual dispersal multiplier matrix; this is proposed in Dupin, Matzke et al. (2016), J. Biogeography.
These can even be combined, where several different possible distances or dispersal multipliers are modified by different parameters, which are estimated from the data. See this post on the BioGeoBEARS google group: https://groups.google.com/d/msg/biogeobears/5pA9w5zGqa4/Dnub2ONkCQAJ
As a side-benefit, the +x and +w models (and the like) create sets of nested models, allowing researchers to subject dispersal-multipliers matrices to valid Likelihood Ratio Tests, if they so desire.
BayArea model
Is the sword moss (Bryoxiphium) a preglacial Tertiary relict?
Since different ancestral area reconstructions are based on different assumptions and can produce conflicting results (Pirie et al., 2012, Matzke, 2013 and Matzke, 2014), we compared these two versions of the DEC model with a likelihood version of the Dispersal-Vicariance Analysis (DIVALIKE), and a likelihood version of the range evolution model of the Bayesian Binary Model (BAYAREA) of RASP (Yu et al., 2015).
Short version: BayArea is the model of Landis et al. (2013). The Bayesian Binary Model (BBM) is not the same — it just treats every area as a binary character, and RASP ran this with the MrBayes library. However, the new version of RASP has I think abandoned BBM, which was deeply flawed (for example, ancestors living nowhere were allowed), and may run the BayArea library instead. BioGeoBEARS implements BAYAREALIKE, a likelihood interpretation of BayArea. The similarities and differences of BAYAREALIKE and BayArea are discussed in the example script on the main BioGeoBEARS page.
The one thing BayArea, BBM, and BAYAREALIKE all share is that there is no special cladogenesis process: under these models, the ancestral range, whether narrow or widespread, is copied to both descendants at speciation with 100% probability. This seems less plausible for most taxa, and the model usually confers much lower likelihood on the data than other models, but not always (e.g., clownfish).
Key References to Avoid Mistakes
Burnham, K.P.; Anderson, DR (2004). Multimodel Inference: Understanding AIC and BIC in Model Selection. Sociological Methods & Research, 33(2), 261-304. http://dx.doi.org/10.1177/0049124104268644
Burnham, K.P.; Anderson, DR (2002). Model selection and multimodel inference: a practical information-theoretic approach. Springer.
https://books.google.com/books?id=fT1Iu-h6E-oC&printsec=frontcover#v=onepage&q&f=false
https://scholar.google.com/scholar?hl=en&q=+Model+selection+and+multimodel+inference%3A+a+practical+information-theoretic+approach&btnG=&as_sdt=1%2C38&as_sdtp=
ESPECIALLY SEE:
Anderson, David; Burnham, Kenneth (2006). "AIC Myths and Misunderstandings." Authors' website at Colorado State University. Last modified April 12, 2006. URL: https://sites.warnercnr.colostate.edu/anderson/wp-content/uploads/sites/26/2016/11/AIC-Myths-and-Misunderstandings.pdf
Matzke (2017). Advice On Statistical Model Comparison In BioGeoBEARS. Last modified September 2017. URL: http://phylo.wikidot.com/advice-on-statistical-model-comparison-in-biogeobears
…and the references therein: http://phylo.wikidot.com/advice-on-statistical-model-comparison-in-biogeobears#refs
References on +x, +w models (exponents on distance or dispersal multipliers)
Dupin, Julia; Matzke, Nicholas J.; Sarkinen, Tiina; Knapp, Sandra; Olmstead, Richard; Bohs, Lynn; Smith, Stacey (2016). Bayesian estimation of the global biogeographic history of the Solanaceae. Journal of Biogeography, 44(4), 887-899. http://dx.doi.org/10.1111/jbi.12898
Van Dam, Matthew; Matzke, Nicholas J. (2016). Evaluating the influence of connectivity and distance on biogeographic patterns in the south-western deserts of North America. Journal of Biogeography. 43(8):1514–1532. Special paper, published online 3 March 2016. http://dx.doi.org/10.1111/jbi.12727