Table of Contents

SMBE 2014, Puerto Rico
Primary endosymbiosis events date to the later Proterozoic with crosscalibrated phylogenetic dating of duplicated ATPase proteins
Monday 9th June: Life Technologies Lunchtime Symposium / Posters 1001  1278  9th June 13.00  15.30
P86
Nicholas Matzke 1 ,2, Patrick Shih3 ,4
1 National Institute of Mathematical and Biological Synthesis, University of Tennessee, Knoxville, TN, USA,
2 Department of Integrative Biology, University of California, Berkeley, CA, USA,
3 Joint Bioenergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA, 4Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
Chloroplasts and mitochondria descended from bacterial ancestors, but the dating of these primary endosymbiosis events remains very uncertain, despite their importance for our understanding of the evolution of both bacteria and eukaryotes. All phylogenetic dating in the Proterozoic and before is difficult: Significant debates surround potential fossil calibration points based on the interpretation of the Precambrian microbial fossil record, and strict molecular clock methods cannot be expected to yield accurate dates over such vast timescales because of strong heterogeneity in rates. Even with more sophisticated relaxedclock analyses, nodes that are distant from fossil calibrations will have a very high uncertainty in dating. However, endosymbiosis events and gene duplications provide some additional information that has never been exploited in dating; namely, that certain nodes on a gene tree must represent the same events, and thus must have the same or very similar dates, even if the exact date is uncertain. We devised techniques to exploit this information: crosscalibration, in which node date calibrations are reused across a phylogeny, and crossbracing, in which node date calibrations are formally linked in a hierarchical Bayesian model. We apply these methods to proteins with ancient duplications that have remained associated and originated from plastid and mitochondrial endosymbionts: the α and β subunits of ATP synthase and its relatives, and the EFTu. The methods yield reductions in dating uncertainty of 14–26% while only using date calibrations derived from phylogenetically unambiguous Phanerozoic fossils of multicellular plants and animals. Our results suggest that primary plastid endosymbiosis occurred ~900 Mya and mitochondrial endosymbiosis occurred ~1,200 Mya.
See poster:
This work is based on:
Shih, Patrick M.; Matzke, Nicholas J. (2013). "Primary endosymbiosis events date to the later Proterozoic with crosscalibrated phylogenetic dating of duplicated ATPase proteins." Proceedings of the National Academy of Sciences, 110(30), 1235512360. (Scholar  DOI  Journal)
Also, see writeup of this article by:
Zhaxybayeva, Olga (2013). "Anciently duplicated genes reduce uncertainty in molecular clock estimates." Proceedings of the National Academy of Sciences, 110(30), 12168–12169. (Scholar  DOI  Journal)
Evolution 2014, Raleigh, NC
Simulation tests of probabilistic models for historical biogeography: DEC and DEC+J
2C_306B Methodology
Date: Sunday, June 22, 2014
Time: 1:30 PM  2:45 PM
Location: 306 B
Chair: Mario dos Reis
2:00 PM  2:15 PM
18
Nicholas Matzke, NIMBioS, gro.soibminekztam#gro.soibminekztam (Presenter)
Contributed Presentation
Simulation tests of probabilistic models for historical biogeography: DEC and DEC+J
Several phylogenetic models for historical biogeography are in widespread use, e.g. character mapping, DispersalVicariance Analysis (DIVA), and DispersalExtinctionCladogenesis (DEC). In addition, new models have become available: BayArea, and the variety of models implemented in the R package BioGeoBEARS. These include DEC+J (which adds founderevent speciation to DEC) and DIVALIKE (a likelihood interpretation of DIVA; a DIVALIKE+J model is also available). There has been very little testing of biogeographical models against simulated data in the situation when the true model is substantially different than the assumed inference model. Also, all of the above models assume that the observed tree is the true tree, ignoring possible missing speciation/extinction events, and dependence of speciation/extinction rates on geographic range. These possibilities are taken into account by the GeoSSE and ClaSSE models, but at the cost of many more free parameters, which may strain typically small biogeographic datasets. To test the accuracy of DEC and DEC+J inference on datasets simulated under different biogeographical and SSE models, I jointly simulated phylogenies and geographic range under 6 macroevolutionary models. The first three assumed speciation/extinction were independent of geographic range: (1) Yule process (purebirth, no extinction); (2) BirthDeath (BD) process with extinction rate 1/3 of the speciation rate; (3) BD process with extinction equal to speciation. The next three assumed an SSE model where the base speciation rate was multiplied by the number of areas occupied, and base extinction rate was divided by the number of areas occupied. This produced (4) SSE with speciation but zero extinction rate; (5) SSE with with the base extinction rate 1/3 of the speciation rate; and (6) SSE with base rates of speciation and extinction equal. For each of the 6 macroevolutionary models, all combinations of low/middle/high values were used for these biogeographic parameters: d (rate of rangeexpansion), e (rate of rangecontraction), and j (relative weight of founderevents versus traditional DEC cladogenesis events at speciation). The datasets (138 parameter combinations, 100 simulations each with 50 living species) were subjected to inference under DEC and DEC+J. DEC and DEC+J were distinguishable under all 6 models, except when j was very small and d very high. DEC artificially raises d and e when DEC+J is the true model, and shows significantly reduced accuracy in inferring ancestral range. These results indicate that the fact that DEC+J is favored over DEC by many empirical datasets is not likely to be an artefact of missing SSE processes.
Nonnull effects of a null range: Exploring parameter estimation in the dispersalextinctioncladogenesis model
1324
Nonnull effects of a null range: Exploring parameter estimation in the dispersalextinctioncladogenesis model
3A_301A Phylogenetic Methods
Date: Monday, June 23, 2014
Time: 8:30 AM  9:45 AM
Location: 301 A
Chair: Elizabeth Wade
9:15 AM  9:30 AM
Kathryn Massana, University of Tennessee, Knoxville, ude.ktuanassamk#ude.ktuanassamk (Presenter)
Jeremy Beaulieu, NIMBios, gro.soibminueiluaebj#gro.soibminueiluaebj
Brian O'Meara, ude.ktuaraemob#ude.ktuaraemob
Nicholas Matzke, NIMBioS, gro.soibminekztam#gro.soibminekztam
Parametric models in historical biogeography that integrate geographic ranges and phylogenies have shown to be extremely informative in understanding the geographic range evolution of taxa. One such approach is the dispersalextinctioncladogenesis (DEC) model, which has been widely used in empirical analyses of the evolution of geographic range using discrete area states. However, local extinction rates are difficult to estimate well in this model. We explore the cause of this as well as a potential solution.
rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography
ievobioE_402 iEvoBio software bazaar
Opensource software demos and reception for iEvoBio
Date: Tuesday, June 24, 2014
Time: 3:15 PM  5:00 PM
Location: 402
rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography (1219)
Nicholas Matzke, NIMBioS (United States)
Drew Schmidt, University of Tennessee
1219
Drew Schmidt, University of Tennessee, ude.ktu.htamtdimhcs#ude.ktu.htamtdimhcs
Nicholas Matzke, NIMBioS, gro.soibminekztam#gro.soibminekztam (Presenter)
iEvoBio Software Demo
rexpokit and cladoRcpp: R packages integrating FORTRAN and C++ for faster matrix exponentiation and likelihood calculations in historical biogeography
Probabilistic models for phylogenybased inference of historical biogeography face several computational challenges, especially when programmed in R. The first is large state spaces: for example, an analysis using 10 discrete geography areas has 2^10=1024 possible combinations of presence/absence in each region, and a transition matrix that is 1024x1024. Exponentiating this matrix is extremely slow in standard R matrix exponentiation routines. The R package "rexpokit" integrates the FORTRAN EXPOKIT library, making exponentiation of such large matrices feasible, although not rapid. Further speed improvements are made by parallel processing. A second challenge is enumerating and assigning probabilities to different biogeographical events at cladogenesis. Here, a naive implementation would have to examine every possible combination of ancestor state, left descendant state, and right descendant state, which would be 1023^3, or over 1 billion combinations. Here I made great improvements in speed with algorithms that eliminate impossible combinations a priori, and use of Rcpp for all forloops. This is implemented in the R package cladoRcpp. I will present quick demonstrations of these calculations, the resulting speedups, and suggest that these packages can serve as relatively simple examples for researchers wishing to integrate FORTRAN or C++ into their R programming.
Rexpokit: http://cran.rproject.org/web/packages/rexpokit/index.html CladoRcpp: http://cran.rproject.org/web/packages/cladoRcpp/index.html Used for historical biogeography by: BioGeoBEARS: BioGeography with Bayesian (and Likelihood) Evolutionary Analysis in R Scripts http://cran.rproject.org/web/packages/BioGeoBEARS/index.html Examples, updates, and help listserv are at PhyloWiki: http://phylo.wikidot.com/biogeobears
GNU General Public License version 3.0 (GPL3.0) http://opensource.org/licenses/GPL3.0