Abstracts For Presentations By Nicholas J Matzke

SMBE 2014, Puerto Rico

Primary endosymbiosis events date to the later Proterozoic with cross-calibrated phylogenetic dating of duplicated ATPase proteins

Monday 9th June: Life Technologies Lunchtime Symposium / Posters 1001 - 1278 - 9th June 13.00 - 15.30


Nicholas Matzke 1 ,2, Patrick Shih3 ,4

1 National Institute of Mathematical and Biological Synthesis, University of Tennessee, Knoxville, TN, USA,

2 Department of Integrative Biology, University of California, Berkeley, CA, USA,

3 Joint Bioenergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA, 4Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA

Chloroplasts and mitochondria descended from bacterial ancestors, but the dating of these primary endosymbiosis events remains very uncertain, despite their importance for our understanding of the evolution of both bacteria and eukaryotes. All phylogenetic dating in the Proterozoic and before is difficult: Significant debates surround potential fossil calibration points based on the interpretation of the Precambrian microbial fossil record, and strict molecular clock methods cannot be expected to yield accurate dates over such vast timescales because of strong heterogeneity in rates. Even with more sophisticated relaxed-clock analyses, nodes that are distant from fossil calibrations will have a very high uncertainty in dating. However, endosymbiosis events and gene duplications provide some additional information that has never been exploited in dating; namely, that certain nodes on a gene tree must represent the same events, and thus must have the same or very similar dates, even if the exact date is uncertain. We devised techniques to exploit this information: cross-calibration, in which node date calibrations are reused across a phylogeny, and cross-bracing, in which node date calibrations are formally linked in a hierarchical Bayesian model. We apply these methods to proteins with ancient duplications that have remained associated and originated from plastid and mitochondrial endosymbionts: the α and β subunits of ATP synthase and its relatives, and the EF-Tu. The methods yield reductions in dating uncertainty of 14–26% while only using date calibrations derived from phylogenetically unambiguous Phanerozoic fossils of multicellular plants and animals. Our results suggest that primary plastid endosymbiosis occurred ~900 Mya and mitochondrial endosymbiosis occurred ~1,200 Mya.

See poster:

PDF of poster for SMBE 2014

This work is based on:

Shih, Patrick M.; Matzke, Nicholas J. (2013). "Primary endosymbiosis events date to the later Proterozoic with cross-calibrated phylogenetic dating of duplicated ATPase proteins." Proceedings of the National Academy of Sciences, 110(30), 12355-12360. (Scholar | DOI | Journal)

Also, see writeup of this article by:

Zhaxybayeva, Olga (2013). "Anciently duplicated genes reduce uncertainty in molecular clock estimates." Proceedings of the National Academy of Sciences, 110(30), 12168–12169. (Scholar | DOI | Journal)

Evolution 2014, Raleigh, NC

Simulation tests of probabilistic models for historical biogeography: DEC and DEC+J

PDF of draft talk

2C_306B Methodology
Date: Sunday, June 22, 2014
Time: 1:30 PM - 2:45 PM
Location: 306 B
Chair: Mario dos Reis

2:00 PM - 2:15 PM


Nicholas Matzke, NIMBioS, gro.soibmin|ekztam#gro.soibmin|ekztam (Presenter)

Contributed Presentation

Simulation tests of probabilistic models for historical biogeography: DEC and DEC+J

Several phylogenetic models for historical biogeography are in widespread use, e.g. character mapping, Dispersal-Vicariance Analysis (DIVA), and Dispersal-Extinction-Cladogenesis (DEC). In addition, new models have become available: BayArea, and the variety of models implemented in the R package BioGeoBEARS. These include DEC+J (which adds founder-event speciation to DEC) and DIVALIKE (a likelihood interpretation of DIVA; a DIVALIKE+J model is also available). There has been very little testing of biogeographical models against simulated data in the situation when the true model is substantially different than the assumed inference model. Also, all of the above models assume that the observed tree is the true tree, ignoring possible missing speciation/extinction events, and dependence of speciation/extinction rates on geographic range. These possibilities are taken into account by the GeoSSE and ClaSSE models, but at the cost of many more free parameters, which may strain typically small biogeographic datasets. To test the accuracy of DEC and DEC+J inference on datasets simulated under different biogeographical and SSE models, I jointly simulated phylogenies and geographic range under 6 macroevolutionary models. The first three assumed speciation/extinction were independent of geographic range: (1) Yule process (pure-birth, no extinction); (2) Birth-Death (BD) process with extinction rate 1/3 of the speciation rate; (3) BD process with extinction equal to speciation. The next three assumed an SSE model where the base speciation rate was multiplied by the number of areas occupied, and base extinction rate was divided by the number of areas occupied. This produced (4) SSE with speciation but zero extinction rate; (5) SSE with with the base extinction rate 1/3 of the speciation rate; and (6) SSE with base rates of speciation and extinction equal. For each of the 6 macroevolutionary models, all combinations of low/middle/high values were used for these biogeographic parameters: d (rate of range-expansion), e (rate of range-contraction), and j (relative weight of founder-events versus traditional DEC cladogenesis events at speciation). The datasets (138 parameter combinations, 100 simulations each with 50 living species) were subjected to inference under DEC and DEC+J. DEC and DEC+J were distinguishable under all 6 models, except when j was very small and d very high. DEC artificially raises d and e when DEC+J is the true model, and shows significantly reduced accuracy in inferring ancestral range. These results indicate that the fact that DEC+J is favored over DEC by many empirical datasets is not likely to be an artefact of missing SSE processes.

Non-null effects of a null range: Exploring parameter estimation in the dispersal-extinction-cladogenesis model


Non-null effects of a null range: Exploring parameter estimation in the dispersal-extinction-cladogenesis model

3A_301A Phylogenetic Methods
Date: Monday, June 23, 2014
Time: 8:30 AM - 9:45 AM

Location: 301 A
Chair: Elizabeth Wade
9:15 AM - 9:30 AM

Kathryn Massana, University of Tennessee, Knoxville, ude.ktu|anassamk#ude.ktu|anassamk (Presenter)
Jeremy Beaulieu, NIMBios, gro.soibmin|ueiluaebj#gro.soibmin|ueiluaebj
Brian O'Meara, ude.ktu|araemob#ude.ktu|araemob
Nicholas Matzke, NIMBioS, gro.soibmin|ekztam#gro.soibmin|ekztam

Parametric models in historical biogeography that integrate geographic ranges and phylogenies have shown to be extremely informative in understanding the geographic range evolution of taxa. One such approach is the dispersal-extinction-cladogenesis (DEC) model, which has been widely used in empirical analyses of the evolution of geographic range using discrete area states. However, local extinction rates are difficult to estimate well in this model. We explore the cause of this as well as a potential solution.

rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography

PDF of draft talk

ievobioE_402 iEvoBio software bazaar

Open-source software demos and reception for iEvoBio

Date: Tuesday, June 24, 2014

Time: 3:15 PM - 5:00 PM

Location: 402

rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography (1219)
Nicholas Matzke, NIMBioS (United States)
Drew Schmidt, University of Tennessee


Drew Schmidt, University of Tennessee, ude.ktu.htam|tdimhcs#ude.ktu.htam|tdimhcs
Nicholas Matzke, NIMBioS, gro.soibmin|ekztam#gro.soibmin|ekztam (Presenter)

iEvoBio Software Demo

rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography

Probabilistic models for phylogeny-based inference of historical biogeography face several computational challenges, especially when programmed in R. The first is large state spaces: for example, an analysis using 10 discrete geography areas has 2^10=1024 possible combinations of presence/absence in each region, and a transition matrix that is 1024x1024. Exponentiating this matrix is extremely slow in standard R matrix exponentiation routines. The R package "rexpokit" integrates the FORTRAN EXPOKIT library, making exponentiation of such large matrices feasible, although not rapid. Further speed improvements are made by parallel processing. A second challenge is enumerating and assigning probabilities to different biogeographical events at cladogenesis. Here, a naive implementation would have to examine every possible combination of ancestor state, left descendant state, and right descendant state, which would be 1023^3, or over 1 billion combinations. Here I made great improvements in speed with algorithms that eliminate impossible combinations a priori, and use of Rcpp for all for-loops. This is implemented in the R package cladoRcpp. I will present quick demonstrations of these calculations, the resulting speedups, and suggest that these packages can serve as relatively simple examples for researchers wishing to integrate FORTRAN or C++ into their R programming.

Rexpokit: http://cran.r-project.org/web/packages/rexpokit/index.html CladoRcpp: http://cran.r-project.org/web/packages/cladoRcpp/index.html Used for historical biogeography by: BioGeoBEARS: BioGeography with Bayesian (and Likelihood) Evolutionary Analysis in R Scripts http://cran.r-project.org/web/packages/BioGeoBEARS/index.html Examples, updates, and help listserv are at PhyloWiki: http://phylo.wikidot.com/biogeobears

GNU General Public License version 3.0 (GPL-3.0) http://opensource.org/licenses/GPL-3.0

SVP 2014, Berlin: Tip-Dating: Estimating Dated Phylogenies Using Fossils as Terminal Taxa

(link to this section)

Note: See the BEASTmasteR code and example scripts!

Tip-Dating: Estimating Dated Phylogenies Using Fossils as Terminal Taxa - FULL

This workshop will introduce participants to new computational methods that allow joint inference of phylogenetic relationships and divergence times. In older dating methods, fossil relationships were estimated with an undated cladistic or Bayesian analysis, and then these fossils were converted, usually subjectively, into prior probability distributions on the dates of certain nodes. These calibrations were then used in molecular clock analyses to date molecular trees. This procedure essentially “threw away” hard-won fossil data (and any living morphology data as well) once the dating calibration was produced.

However, in the last two years, several methods have become available that allow the addition of fossil and living morphology, as well as fossil dates, to dating analyses. In these methods, the phylogenetic relationships of the fossils and living taxa are estimated simultaneously with the dating of the tree. These methods have the potential to revolutionary for paleontologists. First, because character and dating data from fossil specimens are a requirement for the method, paleontologists and morphologists will have an increased role to play in future divergence time analyses, previously the domain of molecular biologists. Second, the joint estimation of fossil relationships and the divergence times of fossil taxa is of intrinsic interest, and many phylogenetic comparative methods can be applied to fossil data once statistically-estimated, time-scaled trees of fossil taxa are available.

The two main methods in use currently are BEAST (Pyron 2011; Wood, Matzke et al. 2013; Alexandrous et al. 2013) and MrBayes 3.3 (Ronquist et al. 2012). Both take more skill and background than traditional phylogeny-estimation and dating methods. Therefore we will guide participants through tutorials and then help them to set up analyses of their own data.

Date: Tuesday, November 4

Time: 10:00am - 4:00pm

Location: The Leibniz Headquarters (Chausseestr. 111, 150 meters away from the Museum für Naturkunde and next to the UBahn station Naturkundemuseum)

Cost: Free (FULL!)
Minimum Number of Participants: 10
Maximum Number of Participants: 40


Nicholas J. Matzke
National Institute for Mathematical and Biological Synthesis
University of Tennessee

April Wright
Univeristy of Texas, Austin

SVP 2014, Berlin: Putting fossils in trees: new methods for combining morphology, time, and molecules to estimate phylogenetic position and divergence times of living and fossil taxa

(link to this section)

Note 1: See the Abstracts for "Putting Fossils in Trees Symposium"
Note 2: See the BEASTmasteR code and example scripts!

Putting fossils in trees: new methods for combining morphology, time, and molecules to estimate phylogenetic position and divergence times of living and fossil taxa

Co-Convenors: Nicholas J. Matzke, April Wright, Graeme Lloyd, David W. Bapst

Fossil data are crucial to correct estimation of phylogeny and divergence times. However, most traditional methods artificially separate the analysis of fossil relationships and divergence time analysis. For example, it is common for paleontologists to estimate the topological position of fossils using cladistic or Bayesian methods, either in a morphology-only or “total evidence” analysis. This tree, which is undated, may then be used by molecular biologists to supply calibration distributions for dating a molecules-only tree of living taxa. Such trees form the starting point for various comparative methods which require dated phylogenies, e.g., model-based ancestral state analyses, diversification analyses, or historical biogeography.

Such procedures “throw away” most of the fossil data, treating paleontology as merely a source of calibration points for molecular analyses, and separate the questions of estimating relationships and dating, when in fact they may be linked. However, increasing collaboration between paleontologists, biologists, statisticians and computer scientists has been fruitful in yielding new technologies and techniques that attempt to combine fossil and living morphology, fossil dates, and molecular data in joint analyses. This symposium will be devoted to reviewing, discussing, and critiquing new methods and models for estimating phylogenetic trees and for incorporating fossils in the derivation of divergence times.

The three foci of the symposium are: 1. "Model-based methods: advantages and limitations." This will focus on the assumptions behind the current probabilistic models for morphological and fossil data, the resulting advantages and limitations, and suggestions for improvements. 2. "Fossils as terminal taxa in dating analyses: prospects and challenges." Methods using fossils as terminal taxa in dating analyses are new and mostly unevaluated, so participants will present case studies that give insight into the practical benefits and problems encountered in the use of such methods. 3. "Fossils as dual information sources: morphology and stratigraphy." The stratigraphic range and sampling frequency of clades also gives important information about the timing of clade origins. Stratocladistics was an early attempt to take this information into account, but was not widely adopted. Probabilistic methods, as well as advances in fossil databases, may allow improved approaches. Participants will review and critique recent developments in this area.

EvMorph series, University of Chicago, October 2014

(link to this section)

Evolutionary Morphology Seminar: Nicholas Matzke, University of Tennessee

When: Thursday, October 9, 2014 7:30–8:30 p.m.

Where: Henry Hinds Laboratory, Room 176

5734 South Ellis Avenue, Chicago, IL

Description: Model Selection in Historical Biogeography: When is Founder-Event Speciation Important?

Contact: Geophysical Sciences

Tag: Seminars

Notes: Persons with disabilities who need an accommodation in order to participate in this event should contact the event sponsor for assistance. For events on the Student Events Calendar, please contact ORCSA at (773) 702-8787.


University of Helsinki, January 2015

(link to this section)

Title: Model Selection in Historical Biogeography: when is Founder-event Speciation important?

Date: Friday, January 16, 2015: Lecture 2 pm
Time: 2 pm - 2:45 pm
Location: Geosciences Department, University of Helsinki, Finland
Room: TBA
Host: Laura K Säilä, Dept. of Geosciences, University of Helsinki, Finland


New Biogeography Model: Founder-event speciation, where a rare jump dispersal event founds a new genetically isolated lineage, has long been considered crucial by many historical biogeographers, but its importance is disputed within the vicariance school. Probabilistic modeling of geographic range evolution creates the potential to test different biogeographical models against data using standard statistical model choice procedures, as long as multiple models are available. I re-implement the Dispersal-Extinction-Cladogenesis (DEC) model of LAGRANGE in the R package BioGeoBEARS, and modify it to create a new model, DEC+J, which adds founder-event speciation, the importance of which is governed by a new free parameter, j. Both models are shown to be special cases of the "claSSE" model.

Simulation tests: The identifiability of DEC and DEC+J is tested on datasets simulated under a wide range of macroevolutionary models where geography evolves jointly with lineage birth/death events. The results confirm that DEC and DEC+J are identifiable even though these models ignore the fact that molecular phylogenies are missing many cladogenesis and extinction events. The simulations also indicate that DEC will have substantially increased errors in ancestral range estimation and parameter inference when the true model includes +J.

Empirical tests: DEC and DEC+J are compared on 13 empirical datasets drawn from studies of island clades. Likelihood ratio tests indicate that all clades reject DEC, and AICc model weights show large to overwhelming support for DEC+J, for the first time verifying the importance of founder-event speciation in island clades via statistical model choice. Under DEC+J, ancestral nodes are usually estimated to have ranges occupying only one island, rather than the widespread ancestors often favored by DEC. These results indicate that the assumptions of historical biogeography models can have large impacts on inference and require testing and comparison with statistical methods.

Further applications: Probabilistic modeling in biogeography opens up many possible research applications, including biogeographical stochastic mapping, biogeographical dating, and inclusion of phylogenetic information in species distribution modeling (SDM).

BioGeoBEARS: Help, tutorials and updates on the BioGeoBEARS R package are available at:


IBS 2015, Bayreuth, Germany

(link to this section)

Title: Biogeographical Stochastic Mapping: Bayesian estimation of the history and timing of biogeographical events on phylogenies

Nicholas Matzke (2015). "Biogeographical Stochastic Mapping: Bayesian estimation of the history and timing of biogeographical events on phylogenies." Talk at the 2015 Biannual Meeting of the International Biogeography Society. Session: Historical and Paleo-Biogeography. January 10, 2015, 13:30-13:45, H 22, RW II.

Date: Saturday, January 10, 2015
Time: 1:30-1:45 pm

Link: http://www.bayceer.uni-bayreuth.de/ibs2015/en/prog/bayconf/beitrag_detail.php?id_obj=12643

Summary: Traditional likelihood methods in historical biogeography estimate the probability of each geographic range at each node. Usually the most-probable range at each node is plotted, and this is taken to be the approximate history. This is not technically accurate and might be badly misleading in some cases. A solution is stochastic mapping of possible histories on the phylogeny. This has been widely applied in phylogenetics for sequence data and discrete characters, but these character models are inappropriate in historical biogeography, where the state space is much more complex, and geographic range changes through both anagenetic and cladogenetic events. I present a novel algorithm that enables stochastic mapping on any biogeographic model available in BioGeoBEARS, as well as graphical display and statistical summary of the timing and frequency of dispersal and vicariance events. An animation of realizations of possible histories under the DEC and DEC+J models is demonstrated for Hawaiian Psychotria shrubs. R functions and an example script performing stochastic mapping are available at http://phylo.wikidot.com/biogeobears . The functions build upon on the R package BioGeoBEARS, available for all platforms at CRAN.

Contact: gro.soibmin|ekztam#gro.soibmin|ekztam

Supplementary Information: R source code is also archived in this article’s online Supplementary Data. (And here: http://phylo.wikidot.com/biogeobears#stochastic_mapping )

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License