Abstracts For Presentations By Nicholas J Matzke

SMBE 2014, Puerto Rico

Primary endosymbiosis events date to the later Proterozoic with cross-calibrated phylogenetic dating of duplicated ATPase proteins

Monday 9th June: Life Technologies Lunchtime Symposium / Posters 1001 - 1278 - 9th June 13.00 - 15.30

P-86

Nicholas Matzke 1 ,2, Patrick Shih3 ,4

1 National Institute of Mathematical and Biological Synthesis, University of Tennessee, Knoxville, TN, USA,

2 Department of Integrative Biology, University of California, Berkeley, CA, USA,

3 Joint Bioenergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA, 4Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA

Chloroplasts and mitochondria descended from bacterial ancestors, but the dating of these primary endosymbiosis events remains very uncertain, despite their importance for our understanding of the evolution of both bacteria and eukaryotes. All phylogenetic dating in the Proterozoic and before is difficult: Significant debates surround potential fossil calibration points based on the interpretation of the Precambrian microbial fossil record, and strict molecular clock methods cannot be expected to yield accurate dates over such vast timescales because of strong heterogeneity in rates. Even with more sophisticated relaxed-clock analyses, nodes that are distant from fossil calibrations will have a very high uncertainty in dating. However, endosymbiosis events and gene duplications provide some additional information that has never been exploited in dating; namely, that certain nodes on a gene tree must represent the same events, and thus must have the same or very similar dates, even if the exact date is uncertain. We devised techniques to exploit this information: cross-calibration, in which node date calibrations are reused across a phylogeny, and cross-bracing, in which node date calibrations are formally linked in a hierarchical Bayesian model. We apply these methods to proteins with ancient duplications that have remained associated and originated from plastid and mitochondrial endosymbionts: the α and β subunits of ATP synthase and its relatives, and the EF-Tu. The methods yield reductions in dating uncertainty of 14–26% while only using date calibrations derived from phylogenetically unambiguous Phanerozoic fossils of multicellular plants and animals. Our results suggest that primary plastid endosymbiosis occurred ~900 Mya and mitochondrial endosymbiosis occurred ~1,200 Mya.

See poster:

PDF of poster for SMBE 2014

This work is based on:

Shih, Patrick M.; Matzke, Nicholas J. (2013). "Primary endosymbiosis events date to the later Proterozoic with cross-calibrated phylogenetic dating of duplicated ATPase proteins." Proceedings of the National Academy of Sciences, 110(30), 12355-12360. (Scholar | DOI | Journal)

Also, see writeup of this article by:

Zhaxybayeva, Olga (2013). "Anciently duplicated genes reduce uncertainty in molecular clock estimates." Proceedings of the National Academy of Sciences, 110(30), 12168–12169. (Scholar | DOI | Journal)

Evolution 2014, Raleigh, NC

Simulation tests of probabilistic models for historical biogeography: DEC and DEC+J

PDF of draft talk

2C_306B Methodology
Date: Sunday, June 22, 2014
Time: 1:30 PM - 2:45 PM
Location: 306 B
Chair: Mario dos Reis

2:00 PM - 2:15 PM

18

Nicholas Matzke, NIMBioS, gro.soibmin|ekztam#gro.soibmin|ekztam (Presenter)

Contributed Presentation

Simulation tests of probabilistic models for historical biogeography: DEC and DEC+J

Several phylogenetic models for historical biogeography are in widespread use, e.g. character mapping, Dispersal-Vicariance Analysis (DIVA), and Dispersal-Extinction-Cladogenesis (DEC). In addition, new models have become available: BayArea, and the variety of models implemented in the R package BioGeoBEARS. These include DEC+J (which adds founder-event speciation to DEC) and DIVALIKE (a likelihood interpretation of DIVA; a DIVALIKE+J model is also available). There has been very little testing of biogeographical models against simulated data in the situation when the true model is substantially different than the assumed inference model. Also, all of the above models assume that the observed tree is the true tree, ignoring possible missing speciation/extinction events, and dependence of speciation/extinction rates on geographic range. These possibilities are taken into account by the GeoSSE and ClaSSE models, but at the cost of many more free parameters, which may strain typically small biogeographic datasets. To test the accuracy of DEC and DEC+J inference on datasets simulated under different biogeographical and SSE models, I jointly simulated phylogenies and geographic range under 6 macroevolutionary models. The first three assumed speciation/extinction were independent of geographic range: (1) Yule process (pure-birth, no extinction); (2) Birth-Death (BD) process with extinction rate 1/3 of the speciation rate; (3) BD process with extinction equal to speciation. The next three assumed an SSE model where the base speciation rate was multiplied by the number of areas occupied, and base extinction rate was divided by the number of areas occupied. This produced (4) SSE with speciation but zero extinction rate; (5) SSE with with the base extinction rate 1/3 of the speciation rate; and (6) SSE with base rates of speciation and extinction equal. For each of the 6 macroevolutionary models, all combinations of low/middle/high values were used for these biogeographic parameters: d (rate of range-expansion), e (rate of range-contraction), and j (relative weight of founder-events versus traditional DEC cladogenesis events at speciation). The datasets (138 parameter combinations, 100 simulations each with 50 living species) were subjected to inference under DEC and DEC+J. DEC and DEC+J were distinguishable under all 6 models, except when j was very small and d very high. DEC artificially raises d and e when DEC+J is the true model, and shows significantly reduced accuracy in inferring ancestral range. These results indicate that the fact that DEC+J is favored over DEC by many empirical datasets is not likely to be an artefact of missing SSE processes.

Non-null effects of a null range: Exploring parameter estimation in the dispersal-extinction-cladogenesis model

1324

Non-null effects of a null range: Exploring parameter estimation in the dispersal-extinction-cladogenesis model

3A_301A Phylogenetic Methods
Date: Monday, June 23, 2014
Time: 8:30 AM - 9:45 AM

Location: 301 A
Chair: Elizabeth Wade
9:15 AM - 9:30 AM

Kathryn Massana, University of Tennessee, Knoxville, ude.ktu|anassamk#ude.ktu|anassamk (Presenter)
Jeremy Beaulieu, NIMBios, gro.soibmin|ueiluaebj#gro.soibmin|ueiluaebj
Brian O'Meara, ude.ktu|araemob#ude.ktu|araemob
Nicholas Matzke, NIMBioS, gro.soibmin|ekztam#gro.soibmin|ekztam

Parametric models in historical biogeography that integrate geographic ranges and phylogenies have shown to be extremely informative in understanding the geographic range evolution of taxa. One such approach is the dispersal-extinction-cladogenesis (DEC) model, which has been widely used in empirical analyses of the evolution of geographic range using discrete area states. However, local extinction rates are difficult to estimate well in this model. We explore the cause of this as well as a potential solution.

rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography

PDF of draft talk

ievobioE_402 iEvoBio software bazaar

Open-source software demos and reception for iEvoBio

Date: Tuesday, June 24, 2014

Time: 3:15 PM - 5:00 PM

Location: 402

rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography (1219)
Nicholas Matzke, NIMBioS (United States)
Drew Schmidt, University of Tennessee

1219

Drew Schmidt, University of Tennessee, ude.ktu.htam|tdimhcs#ude.ktu.htam|tdimhcs
Nicholas Matzke, NIMBioS, gro.soibmin|ekztam#gro.soibmin|ekztam (Presenter)

iEvoBio Software Demo

rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography

Probabilistic models for phylogeny-based inference of historical biogeography face several computational challenges, especially when programmed in R. The first is large state spaces: for example, an analysis using 10 discrete geography areas has 2^10=1024 possible combinations of presence/absence in each region, and a transition matrix that is 1024x1024. Exponentiating this matrix is extremely slow in standard R matrix exponentiation routines. The R package "rexpokit" integrates the FORTRAN EXPOKIT library, making exponentiation of such large matrices feasible, although not rapid. Further speed improvements are made by parallel processing. A second challenge is enumerating and assigning probabilities to different biogeographical events at cladogenesis. Here, a naive implementation would have to examine every possible combination of ancestor state, left descendant state, and right descendant state, which would be 1023^3, or over 1 billion combinations. Here I made great improvements in speed with algorithms that eliminate impossible combinations a priori, and use of Rcpp for all for-loops. This is implemented in the R package cladoRcpp. I will present quick demonstrations of these calculations, the resulting speedups, and suggest that these packages can serve as relatively simple examples for researchers wishing to integrate FORTRAN or C++ into their R programming.

Rexpokit: http://cran.r-project.org/web/packages/rexpokit/index.html CladoRcpp: http://cran.r-project.org/web/packages/cladoRcpp/index.html Used for historical biogeography by: BioGeoBEARS: BioGeography with Bayesian (and Likelihood) Evolutionary Analysis in R Scripts http://cran.r-project.org/web/packages/BioGeoBEARS/index.html Examples, updates, and help listserv are at PhyloWiki: http://phylo.wikidot.com/biogeobears

GNU General Public License version 3.0 (GPL-3.0) http://opensource.org/licenses/GPL-3.0

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License