Tree Thinking With R

Introduction

This the code for a lab for the University of Auckland's BioSci 109 course introducing Ecology and Evolution. All code should run "out of the box" by copying-and-pasting into R. (You may prefer to copy-paste the code into RStudio's script editing window, as this color-codes the code, making it easier to read. Each code chunk can then be executed by highlight it and clicking "run", or hitting something like command-enter (depends on keyboard/operating system).

Code - Introduction to Tree-Thinking in R, using phylogenies of tetrapods

#######################################################
# BIOSCI 109 Lab: Introduction to Tree-Thinking in R,
# using phylogenies of tetrapods
# 
# by: Nick Matzke, 2019
# School of Biological Sciences
# University of Auckland
#######################################################

#######################################################
# PART I.   SET-UP
# PART II.  ULTRA-SHORT CRASH COURSE IN R
# PART III. INTRODUCTION TO TREES IN R
# PART IV.  VERTEBRATE PHYLOGENY
# PART V.   MAPPING CHARACTER EVOLUTION ON A PHYLOGENY:
#           FEATHERED FLIGHT AND VIVIPARITY
# PART VI.  THE PHYLOGENY AND BIOGEOGRAPHY OF KIWIS, MOAS, 
#           AND RELATIVES (BONUS MATERIAL, OPTIONAL)
#######################################################

#######################################################
# Summary:
#
# In this lab we will learn about "tree-thinking" -- 
# how to read evolutionary trees, and how to think 
# about evolution in terms of different depictions
# of evolutionary trees.
# 
# The lab will do this using "R", an open-source 
# statistical computing language that is now used 
# in science and statistics worldwide. NO PREVIOUS
# KNOWLEDGE OF R IS REQUIRED -- all of the commands
# used here can be copy-pasted from this script
# into R.  
#
# NOTE: Phylogenetics and R are two of the areas 
# for which New Zealand and the University of 
# Auckland are well-known in science. For example,
# the original publication of R, Ihaka & 
# Gentleman (1996), now has over 10,000 citations
# in the scientific literature. The authors, Ross
# Ihaka and Robert Gentleman, are from the 
# Department of Statistics at the 
# University of Auckland.
#
# Assessment: 
# 
# All tree-thinking material (the specific 
# trees, the conceptual understanding of trees, etc.) 
# WILL BE FAIR GAME for assessments in this course,
# and it will also be assumed knowledge for future
# courses involving evolution.
#
# The "R" material WILL NOT be assessed -- however,
# learning R can be a very useful skill for data
# analysis, scientific graphics, and statistics.
# Hopefully, by getting a taste of R here, you will 
# be encouraged to develop your R skills in future
# courses and projects. (Note: knowing R can help
# you get a job!)
# 
#######################################################

##############################################################
# LICENSE:
# 
# Free to use/redistribute under:
# Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) 
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the above license, linked here:
# 
# http://creativecommons.org/licenses/by-nc-sa/3.0/
# 
# Summary:
#
# You are free:
#
#   * to Share — to copy, distribute and transmit the work
#   * to Remix — to adapt the work
#
# Under the following conditions:
#
#   * Attribution — You must attribute the work in the manner 
#     specified by the author or licensor (but not in any way that 
#     suggests that they endorse you or your use of the work).
#   * Noncommercial — You may not use this work for commercial purposes. 
#
#   * Share Alike — If you alter, transform, or build upon this work,
#     you may distribute the resulting work only under the same or
#     similar license to this one. 
#
# http://creativecommons.org/licenses/by-nc-sa/3.0/
# 
###################################################################

###################################################################
#
# Please link/cite if you use this, please email me if you have 
#  thoughts/improvements/corrections: n.matzkeATauckland.ac.nz
#
# Citation: Matzke, Nicholas J. (2019). Introduction to 
# Tree-Thinking in R, using phylogenies of tetrapods and 
# paleognath birds. Lab for BIOSCI 109 at the University of 
# Auckland, New Zealand.
#
###################################################################

###################################################################
# PART I. SET-UP
###################################################################
# 
# A. Installation of R
# 
# Your lab computers should already have R installed. Find it and
# open it. 
# 
# If you are doing the lab on your own laptop or another machine, 
# you will have to install R first.  Because R is free and open-
# source, this is easy.
# 
# Two options:
# A1. Basic (terminal) R, or R.app for Macs:
#     Go to: https://cloud.r-project.org/ 
# 
# A2. RStudio:
#     https://www.rstudio.com/products/rstudio/download/
# 
# B. Installation of R packages
# 
# R "packages" allow scientists to perform specialist analyses.
# R packages are also open-source and shared freely worldwide.
# Here, we will be using 3 packages designed for phylogenetics:
#
# 1. APE ("Analysis of Phylogenetics and Evolution")
# 2. phytools (phylogeny tools)
# 3. phangorn (yes, this is a nerd pun)
# 
# These packages are already installed on the lab computers.
# On other computers, you can install them by typing these 
# commands in R:
# 
# install.packages("ape")
# install.packages("phytools")
# install.packages("phangorn")
# 
###################################################################

###################################################################
# PART II. ULTRA-SHORT CRASH COURSE IN R
###################################################################
# R has a "command-line interface", rather than a "point-and-click"
# interface. Although "point-and-click" is easier at first, 
# command-line is better for science, because:
#
# (a) you can save all your commands in a text file, and repeat 
# the analysis easily by copying and pasting the saved commands; 
# this is much easier than trying to remember all the clicks you 
# did, in what order.
#
# (b) these saved commands constitute an "R script" that you can 
# share with other scientists, who can modify and improve the 
# script. This lab is in the form of an R script.
# 
# (c) this is an easy form of beginning programming; eventually,
# it leads to writing your own programs and R packages, thus 
# earning fame and fortune.
# 
# We will not attempt a complete introduction to R here, just 
# the absolute basics. 
#
# COMMENTS
# 
# Any line that starts with a "#" is a comment. Comments, when
# pasted into R, do nothing.
# 
# Comments are where the script describes what is happening.
# Well-commented code is valuable because other users, or 
# your future self, can more easily figure out what each 
# piece of code is doing.
# 
# BASIC MATH
# 
# Type these commands into R to see R do math:
1+1
30/3
5*3

# STORING AND RETRIEVING VARIABLES
# 
# Variables can store numeric data
# Typing the name of the variable returns what is stored:
var1 = 1+1
var1

# Variables can also store character data
var2 = "Hello, world!"
var2

# LOGICAL TESTS
#
# The "==" operator tests if two things are equal,
# and returns TRUE or FALSE
1 == 1
var1 == 1
var1 == 2

# R can also do greater than, less than, not equal, etc.
1 < 2
2 > 1
1 != 1
1 != 2

# VARIABLES STORING ARRAYS
# A variable can store multiple items; this is a "vector"
# or "array".  Let's store a list of species names:
var3 = c("human", "chimp", "gorilla", "orangutan")
var3

# Items in an array can be accessed with square brackets: [ ]
var3[1]
var3[2]
var3[3]
var3[4]

# FOR-LOOPS
# A "for-loop" can be used to loop through an array, for
# example to print or to search for a match:

# ( copy-and-paste from "for" to "}" )
# Print each item in var3:
for (i in 1:4)
    {
    print(var3[i])
    }

# PRINT ONLY MATCHES, WITH IF-ELSE STATEMENT
# * "If-else" statements allow the code to
#   perform an action, *if* a particular
#   TRUE/FALSE statement is true. 
#
# * Here, the variable "i" iterates through
#   1, 2, 3, and 4.  For each i, var3 is 
#   checked to see if it equals "human". 
#   If so, it prints "match found". If not,
#   it prints "match not found".
#
# * If-else statements allow more complex 
#   analyses.
#
# ( copy-and-paste from "for" to the last "}" )
for (i in 1:4)
    { 
    print(var3[i])
    if (var3[i] == "human")
        {
        print("match found!")
        } else {
        print("match not found!")
        }
    }

# FUNCTIONS
#
# Functions are pre-written code that takes an input
# and produces an output.  All functions end with 
# round brackets -- "(" and ")" -- which is where
# the inputs go.
# 
# We saw an example already, with the print() 
# function:
print(var3)
print(var3[1])

# Many R functions are statistical. For example,
# random number generation:

# Draw a random number from a normal distribution
# with a mean of 1.0, standard deviation of 0.1
rnorm(n=1, mean=1.0, sd=0.1)

# Do it several times
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)

# Generate 100 such numbers and store in var4
var4 = rnorm(n=100, mean=1.0, sd=0.1)

# R functions can also do statistical plots
# Here is a histogram plot of var4

# Looking at var4 as a list of nubmers is difficult
var4

# But, looking at a histogram is easy:
hist(var4)

# Repeat the histogram with 10,000 samples from 
# a normal distribution. Does this look
# closer to the theoretical expectation for
# a normal distribution (a "bell curve")?
var5 = rnorm(n=10000, mean=1.0, sd=0.1)
hist(var5)

# LIBRARY-ING R PACKAGES, TO ACCESS MORE FUNCTIONS
# 
# R packages store functions for specific scientific
# purposes.  For example, just like R can generate
# random numbers, R can generate random phylogenetic
# trees, with the "rtree()" function:

# Paste into R:
tr = rtree(n=10, rooted=TRUE)

# Oops, that produced an error! The "rtree" (random tree)
# function is part of the "ape" package. So we have to 
# library() the ape package first:

library(ape)
tr = rtree(n=10, rooted=TRUE)

# No error that time, since we did "library(ape)".
# Type "tr" to see what is in the variable "tr":
tr

# That's a tree with 10 tips!  Let's see what it 
# looks like, with the plot() function:
plot(tr)

# Let's add a title
title("This is a randomly-generated phylogenetic tree, with 10 tips.")

# Let's make a bigger tree, with 40 tips:
tr = rtree(n=40, rooted=TRUE)
plot(tr)
title("This is a randomly-generated phylogenetic tree, with 40 tips.")

##########################################################
# CONCLUSION TO R CRASH COURSE
#
# That's enough basic R for now.  The rest of the script 
# will use various R functions to load and display 
# phylogenies. Your job will be to copy-and-paste
# the commands, look at the tree graphic, and 
# learn about the different ways phylogenetic trees
# are displayed, interpreted -- and mis-interpreted!
##########################################################

#######################################################
#######################################################
# INSTRUCTIONS: COPYING-AND-PASTING CODE CHUNKS
#######################################################
#######################################################
#
# For small commands, it is fine to paste in the
# commands one, or a few, lines at a time.
#
# However, for larger plots, it is best to paste 
# in "chunks" of code. To make it clear where to
# do this, we have labeled the "code chunks" like this:
# 
#######################################################
# START CODE CHUNK #1
#######################################################
#
# (some R code)
#
#######################################################
# END CODE CHUNK #1
#######################################################
#
# When you see a code chunk, copy-and-paste the 
# whole chunk at once.
#
# There are 17 labeled code chunks in the lab exercise,
# or 25 if the bonus material in part 6 is included.
#######################################################

#######################################################
# PART III. INTRODUCTION TO TREES IN R
#######################################################

# Let's make sure our R packages are library()-ed:
library(ape)
library(phytools)
library(phangorn)

# A. Phylogenies: Basic terminology

#######################################################
# START CODE CHUNK #1
#######################################################

# Let's load a simple phylogeny
great_ape_newick_string = "(((human:6,chimpanzee:6):1,gorilla:7):5,orangutan:12);"
great_ape_phylogeny = read.tree(file="", text=great_ape_newick_string)
great_ape_phylogeny

# And plot the phylogeny
plot(great_ape_phylogeny)
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Simple phylogeny of the great apes")

#######################################################
# END CODE CHUNK #1
#######################################################

# Terminology: 
# Phylogenies have several parts:
# 1. Nodes. There are 3 kinds of nodes:
#    a. "Tip node", or "leaf" -- these are the "tips" of 
#       the tree. Each tip represents a species.
#    b. "Internal node" -- these are the branching 
#       points of the tree. Each represents a 
#       speciation event.
#    c. "Root node". This is the internal node at the 
#       "bottom" of the tree, i.e. the "roots" of
#       the tree.
# 
# 2. Branches or edges. The branches are the lines
#    connecting the nodes.
#    a. As you can see, some branches have different
#       lengths. However (KEY POINT), whether or not
#       the displayed branch lengths are meaningful
#       depends on what kind of phylogeny is being 
#       displayed.
#    b. In CLADOGRAMS, the branch lengths are 
#       not meaningful. Cladograms just display
#       the hierarchical pattern of relationships
#       between species, also known as the 
#       "tree topology".  In cladograms, 
#       the branches may differ in length, but 
#       this is done only for graphical 
#       convenience.
#    b. In PHYLOGRAMS, the branch lengths are 
#       meaningful. The branch lengths can
#       represent:
#       - Amount of molecular change (e.g., the
#         amount of DNA change)
#       - Amount of morphological change (e.g.,
#         the number of changes in discrete
#         characters)
#       - Amount of time elapsed. 
#
# 3. Tip names, or "tip labels". Typically tip
#    names are the common or scientific names of
#    the species being studied, and the tree is a 
#    tree of species, but tips can also be 
#    gene names, specimen names, populations,
#    languages, or anything else that one 
#    might want to study the phylogenetic
#    history of.
#
# 4. Phylogenetic graphics are scientific figures,
#    often used in the figures in textbooks, and
#    in scientific publications. As such they 
#    often have additional information beyond
#    just the nodes, branches, and tip labels
#    of the phylogeny. These can include:
#    
#    a. Scale bar or timescale, showing the 
#       units of branch length
#    b. Axis label
#    c. Descriptive title.
# 
# Now, let's answer some basic questions, using 
# the information above, and the tree you have
# plotted.
#
# Q1: Is the phylogeny we are looking at a
#     phylogram or cladogram?
#
# Q2: Which species are most-closely related 
#     in this phylogeny?
#
# Q3: Can a cladogram have a scale bar?
#
# Q4: How many clades are shown in this tree?
#
#
# R assigns each node a node number, in order
# to keep track of the tree structure in the 
# computer. We can
# display these node numbers on the plot. Run
# this code and answer the following questions.

#######################################################
# START CODE CHUNK #2
#######################################################

# Plot the phylogeny
plot(great_ape_phylogeny)
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Simple phylogeny of the great apes")

# Add the node numbers to the plot
nodelabels()
tiplabels()

#######################################################
# END CODE CHUNK #2
#######################################################

# Q5: Give the node numbers of the tip nodes
# Q6: Give the node numbers of the internal nodes
# Q7: Give the node number of the root node
#
# Q8: Give the age of each of the 7 nodes
#
# Q9:  Which node represents the split between 
#      humans and chimps?
# Q10: Which node represents the split between 
#      humans and gorillas?
# Q11: Which node represents the split between 
#      gorillas and chimps?

# R also assigns numbers to the branches/edges.
#    Let's plot those.

#######################################################
# START CODE CHUNK #3
#######################################################
# Plot the phylogeny
plot(great_ape_phylogeny)
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Simple phylogeny of the great apes")

# Add the branch numbers to the plot
edgelabels()
#######################################################
# END CODE CHUNK #3
#######################################################

# Q12: Which branch is longest?  How long is it?
#      (include the units in your answer)
# Q13: Which branch is shortest?  How long is it?
#      (include the units in your answer)
# Q14: Why are there 6 branches, and not 9 or 12?
#      Displaying the same tree in another way
#      might help you answer:

#######################################################
# START CODE CHUNK #4
#######################################################
plot(great_ape_phylogeny, type="c")
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Simple phylogeny of the great apes,\ndiagonal branch view")
edgelabels()
#######################################################
# END CODE CHUNK #3
#######################################################

# Cladograms versus Phylograms
# Let's compare 2 versions of the great ape phylogeny:
#######################################################
# START CODE CHUNK #5
#######################################################
great_ape_newick_string1 = "(((human:6,chimpanzee:6):1,gorilla:7):5,orangutan:12);"
great_ape_newick_string2 = "(((human,chimpanzee),gorilla),orangutan);"

# And, let's plot them both:
great_ape_phylo1 = read.tree(file="", text=great_ape_newick_string1)
great_ape_phylo2 = read.tree(file="", text=great_ape_newick_string2)

# Plot 2 graphics on same page:
dev.off()   # (closes previous graphics windows)

# Plot into a PDF
pdffn = "compare_2_great_ape_phylogenies.pdf"
pdf(file=pdffn, width=8.5, height=11)

par(mfrow=c(2,1))
plot(great_ape_phylo1)
title("Tree from great_ape_newick_string1")

plot(great_ape_phylo2)
title("Tree from great_ape_newick_string2")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #5
#######################################################

# Q15: Which tree is a cladogram?
#
# Phylogenetic trees are displayed in R, but they are 
# stored in a simple, textual format, called
# parenthetical or "newick" format.  In R, chunks
# of text are called "strings" of characters, so
# I called the variables "great_ape_newick_string1"
# and "great_ape_newick_string2". 
#
# Q16: Look at the two newick_strings. What is the difference
#      between them? How does that connect to the difference
#      between a cladogram and a phylogram?
#
# Q17: The "newick" format for phylogenies was invented
#      by some evolutionary biologists in the 1980s. 
#      Google the origin of the name "newick" -- where
#      does the name come from?
#

###################################################################
# PART IV. VERTEBRATE PHYLOGENY
###################################################################

# Now we will explore a more complex phylogeny: a backbone
# tree of vertebrates, focusing on the tetrapods.  Again, 
# in addition to the tree-thinking
# skills, LEARN THE STRUCTURE OF THE VERTEBRATE PHYLOGENY. YOU
# SHOULD BE ABLE TO DRAW IT FROM MEMORY (meaning, know the cladogram;
# you should have a general idea of the divergence times, but 
# these are only known approximately and are still debated to 
# some extent). The vertebrate cladogram is basic 
# knowledge for all of your future work in biology and evolution.
# 
# This phylogeny is simplified from Chiari et al. (2012). 
# 

#######################################################
# START CODE CHUNK #6
#######################################################

# Get the Newick string and load the tree.
dev.off()   # (closes previous graphics windows)

pdffn = "compare_2_great_ape_phylogenies.pdf"
pdf(file=pdffn, width=9, height=9)

vert_newick_str = "(shark:471,(tuna:432,(lungfish:416,(frog:352.6457263,(((((kiwi:81.06707278,seagull:81.06707278):132.1992512,crocodile:213.266324):38.01256425,turtle:251.2788883):20.2733676,((wall_lizard:134.7070246,(snake:127.5268735,anole_lizard:127.5268735):7.180151025):85.293,Tuatara:220):51.5522313):42.65854432,(platypus:187.9246145,(opossum:147.3778793,human:147.3778793):40.5467352):126.2861857):38.43492607):63.35427375):16):41);"
vert_phylo = read.tree(file="", text=vert_newick_str)
plot(vert_phylo)
axisPhylo()
title("Lab Figure 1: Backbone phylogeny of vertebrates")
mtext(side=1, text="Ma (millions of years ago)", line=3)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #6
#######################################################

# Answer these basic questions:
# 
# Q18: Is this a phylogram or cladogram?
#
# Q19: In the term "tetrapod", what do "tetra" and "pod" mean?
#
# Q20: Does everything in the tetrapod group fit this 
#      description, or does the term describe 
#      the inferred common ancestor of the group?
# 
# Q21: Approximately when did tetrapods begin to diversify?
# 
# Q22: I have put an example species of each major group at the tips.
#      Obviously, this phylogeny does not contain all of the species 
#      in these groups.  On a paper copy of Figure 1, write each
#      group name next to the tip with its representative species:

# Names corresponding to a single tip in this tree:
# Actinopterygii
# Amphibia
# Serpentes
# Sphenodon
# Testudines
# Crocodilia
# Paleognathae
# Neognathae
# Placentalia
# Monotremata
# Marsupalia
# Chondrichthyes
# Neoceratodus
# 
# Name corresponding to 3 tips in the tree
# Squamates
#
#
# Q23: For each of the groups above, look up the 
#      approximate diversity (=number of species)
#      in the group, and write it next to the group
#      name.
#
# Q24: Based on the phylogeny and further reading, 
#      why is New Zealand's tuatara much 
#      famous than most other reptiles?
# 
# Q25: Based on the phylogeny and further reading, 
#      why are Australia's platypus and echidna 
#      more famous than most other mammals?
# 
# Q26: Based on the phylogeny and further reading, 
#      why is Australia's lungfish more famous than
#      most other fishes?

# Polyphyletic groups versus monophyletic groups (=clades).
#
# Taxonomy is the science of classifying organisms. A taxon
# (plural: taxa) is the name of a group of organisms. These 
# can be a species name, genus name, some other Linnaean 
# ranked taxon (order, class, etc.), or some unranked group 
# name.
#
# Linnaean taxonomy, which goes back to Linnaeus in the 1700s,
# relies on a classification of groups-within-groups: species
# are grouped into genera, genera into families, etc. These
# groupings were typically done based on overall similarity of 
# shared characteristics.
#
# Phylogenetic systematics has evolved in recent decades, and
# a key idea of phylogenetic systematics is that taxa should
# be monophyletic groups, a.k.a. clades.  A monophyletic taxon
# includes a common ancestor and all of its descendants. If 
# a proposed taxon includes leaves out some of the 
# descendants of the common ancestor, it is paraphyletic. If
# a proposed taxon leaves out the common ancestor, it is
# polyphyletic.
#
# Phylogenetic systematists prefer monophyletic taxa, because
# they are evolutionarily "real" -- they refer to coherent
# groups on the Tree of Life (the phylogeny), which should
# be discoverable by independent investigators using 
# independent datasets.  Non-monophyletic groups are to some
# extent arbitrary -- typically based on key characteristics
# that humans find interesting, or overall appearance.
#
# During a phylogenetic analysis, researchers sometimes
# discover Linnaean ranked taxa to be 
# monophyletic, and sometimes para- or polyphyletic.
# Typically, monophyletic taxa are kept, and non-
# monophyletic taxa are replaced with new names representing
# proposed clades.  Some scientists prefer to keep the
# Linnaean ranked system (but modifying it to make all
# groups monophyletic), some argue we should abandon
# the Linnaean ranks (genus, family, etc.) and just
# use unranked clade names. This debate is ongoing, so
# you need to be aware that different scientific authors
# take different positions.
#
# Here, we will practice our skills at identifying 
# monophyletic, polyphyletic, and paraphyletic groups.
# 

# Q27: Using brackets, label these traditional groups 
#      (i.e., as understood in popular culture) on the 
#      paper copy of the "Figure 1" tree:
# 
# Names corresponding to one or more tips in the tree:
#
# mammals
# reptiles
# birds
# amphibians
# fishes

# Q28: Of the five groups, name the groups that are
#      are not monophyletic with respect to the other 
#      four.
# 
# Q29: For the non-monophyletic groups, are they 
#      polyphyletic or paraphyletic?
#
# Q30: For each non-monophyletic group, which
#      other groups would have to be included
#      to make them monophyletic?
#
# Q31: If a scientist says, "phylogenetically
#      speaking, birds are just a subgroup of
#      reptiles", what do they mean?
#
# Q32: If a scientist says, "phylogenetically
#      speaking, tetrapods are just a subgroup
#      of fishes", what do they mean?
# 
# Q33: If a taxon "fishes" is defined as a 
#      monophyletic group, are humans then
#      just an interesting type of fish,
#      phylogenetically speaking?
#  
# Q34: If a taxon "reptiles" is defined as a 
#      monophyletic group, are humans then
#      just an interesting type of reptile,
#      phylogenetically speaking?
# 
# Q35: If a taxon "reptiles" is defined as a 
#      monophyletic group, are birds then
#      just an interesting type of reptile,
#      phylogenetically speaking?

# Let's add some dinosaurs to this tree:
# 

#######################################################
# START CODE CHUNK #7
#######################################################
vert_plusDinos_newick_str = "(shark:471,(tuna:432,(lungfish:416,(frog:352.6457263,(((((((((kiwi:71.06707278,seagull:71.06707278):60,Confuciusornis:5):20,Archaeopteryx:5):7.1992512,Velociraptor:80):30,Brontosaurus:50):25,crocodile:213.266324):38.01256425,turtle:251.2788883):20.2733676,((wall_lizard:134.7070246,(snake:127.5268735,anole_lizard:127.5268735):7.180151025):85.293,Tuatara:220):51.5522313):42.65854432,(platypus:187.9246145,(opossum:147.3778793,human:147.3778793):40.5467352):126.2861857):38.43492607):63.35427375):16):41);"
vert_plusDinos_phylo = read.tree(file="", text=vert_plusDinos_newick_str)

pdffn = "backbone_dinos_wFossils.pdf"
pdf(file=pdffn, width=9, height=9)

plot(vert_plusDinos_phylo)
axisPhylo()
title("Backbone phylogeny of vertebrates,\nsome dinosaurs added")
mtext(side=1, text="Ma (millions of years ago)", line=3)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #7
#######################################################

# Q36: When someone says, "Actually, birds are
#      a type of dinosaur", what do they mean?
# 
# Q37: The smallest-known dinosaur is a species
#      living today. What is it? (use google)
#
# Q38: What two groups of tetrapods living today can fly?
#
# Q39: If "flying tetrapods" was proposed as a taxon, 
#      would this be monophyletic, paraphyletic, or
#      polyphyletic? Why?
# 

#######################################################
# Crown groups and stem groups
#
# As our ability to infer phylogenies has been improved,
# it has become useful to distinguish between "crown"
# and "stem" groups. These terms are most useful for 
# fossils, which often are related to, but fall outside
# of, a living clade ("the crown").
#
# Q40: Here, the two non-avian dinosaurs are stem groups of
#      what extant crown clade?
#
# Q41a: What taxa in your tree represent the 
#       smallest living crown clade that includes the
#       dinosaurs?
# Q41b: What is the name of this group?
#       (You may have to google this!)
#######################################################

#######################################################
# Tree-thinking: avoiding notions of "progress" and 
# "higher" versus "lower"
#######################################################
#
# Going back to the ancient Greeks, it has been 
# common to think of life as a "Great Chain of 
# Being", or a ladder.  It looks something like
# this:
#
# gods
# angels
# humans
# apes
# mammals
# reptiles
# amphibians
# fishes
# other squishy things ("lower animals")
# plants
# rocks
# 
# This way of thinking is so deeply embedded that
# it often creeps into both popular and scientific
# discussions of evolution and phylogeny. I even
# did it by accident in our vertebrate phylogeny.
# 
# Let's display the phylogeny in the way it is 
# often shown in textbooks:

#######################################################
# START CODE CHUNK #8
#######################################################
pdffn = "subset_vert_cladogram.pdf"
pdf(file=pdffn, width=9, height=9)

plot(vert_phylo, type="cladogram", direction="upwards", use.edge.length=FALSE)
title("Backbone cladogram of vertebrates")

# Or, even more simplified:
tips_to_drop = c("kiwi", "seagull", "turtle", "crocodile", "wall_lizard", "anole_lizard", "Tuatara")
vert_phylo_subset = drop.tip(vert_phylo, tip=tips_to_drop)
plot(vert_phylo_subset, type="cladogram", direction="upwards", use.edge.length=FALSE)
title("Backbone cladogram of vertebrates\n(subset)")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #8
#######################################################

# 
# Here, because we read left-to-right, the display of
# the cladogram tends to accidentally reinforce 
# the ladder-like, "Great-Chain-of-Being" misunderstanding
# of evolution.
#
# To understand that this "linear view of evolution" is
# a misconception, one has to understand that:
#
# IN VIEWING PHYLOGENIES, NODE-ROTATION IS ARBITRARY
#
# ...meaning, that any of the branches may be rotated
# around the nodes, and EXACT SAME INFORMATION IS 
# ENCODED.  A researcher might chose to arrange a 
# tree a certain way in order to emphasise a 
# particular point, or a particular clade of 
# interest, but these are basically subjective decisions.
# 
#######################################################

# Demonstrating node rotations:

#######################################################
# START CODE CHUNK #9
#######################################################
# Open PDF
pdffn = "compare_2_vert_cladograms.pdf"
pdf(file=pdffn, width=6, height=9)

# Show 2 plots in one PDF
par(mfrow=c(2,1))
plot(vert_phylo_subset, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Display #1: Subset cladogram of vertebrates,\n(mammals at the right)")

# (rotating some nodes)
vert_phylo_subset2 = rotate(phy=vert_phylo_subset, node=c(12))
vert_phylo_subset2 = rotate(phy=vert_phylo_subset2, node=c(13))
vert_phylo_subset2 = rotate(phy=vert_phylo_subset2, node=c(14))
plot(vert_phylo_subset2, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Display #2: Subset cladogram of vertebrates,\n(amphibians at the right)")

# Stop writing PDF, open to screen
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #9
#######################################################

# Q42: Does the mammal group "look" more primitive in the 2nd 
#      display of the phylogeny?
#
# Q43: Is the mammal group ACTUALLY any more "primitive"
#      or "advanced" in the 1st versus 2nd display of
#      the phylogeny?
#

# Arranging the full tree to put the most-recently diverging
# groups at the right:

#######################################################
# START CODE CHUNK #10
#######################################################
# Show two plots in one PDF
# Open PDF
pdffn = "compare_2_vert_cladograms.pdf"
pdf(file=pdffn, width=6, height=9)

# Show 2 plots
par(mfrow=c(2,1))
plot(vert_phylo, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Display #3: Backbone cladogram of vertebrates,\n(mammals at the right)")

vert_phylo2 = ladderize(vert_phylo, right=FALSE)
plot(vert_phylo2, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Display #4: Backbone cladogram of vertebrates,\n(most-recent divergences at the right)")

# Stop writing PDF, open to screen
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #10
#######################################################

# Q44: What reptile species "appears" to be closest to 
#      humans in Display #3?
#
# Q45: What reptile species is ACTUALLY closest to humans,
#      evolutionarily speaking, according to a proper
#      reading of these phylogenies? (Note: trick question!)
# 
# Q46: Counting from the initial divergence of the tetrapods, 
#      approximately how many million years of evolution
#      has each of these groups experienced?
# 
#      humans
#      opossum
#      platypus
#      snake
#      Tuatara
#      crocodile
#      frog
# 
# Q47: Which of these groups is "most evolved"?
#
#      (Note: there may be some trick questions in here!)
#

#######################################################
# PART V. MAPPING CHARACTER EVOLUTION ON A PHYLOGENY:
#         FEATHERED FLIGHT AND VIVIPARITY
#######################################################
# Here, we will practice mapping character evolution
# on a phylogeny. This requires defining some terms: 
#
# character: Any characteristic of a species that is 
# well-enough defined that it can be reliably 
# recognized in a species. Examples could includee: 
# feather colour, number of legs, presence of a tail,
# etc. These would be "morphological" characters -- 
# characters about the physical morphology 
# (morphology = "study of shape") of an organism.
#
# Many other kinds of characters exist, e.g.:
# - behavioural (parental care, mating behaviour)
# - cytological (cell structure)
# - karyotypic (chromosome number and structure)
# - molecular (DNA and amino acid sequences)
#
# A "character map" is a reconstruction of the 
# evolutionary history of the character, on a 
# phylogeny.
#
# (As a statistical phylogeneticist, I prefer to 
#  call these reconstructions "statistical estimates",
#  but "reconstruction" is an easier term to begin 
#  with.)
#
# There are a number of methods to reconstruct
# the history of a character on a phylogeny. A
# method that is easy to understand intuitively
# is "parsimony". A parsimonious reconstruction
# is a history that explains the evolutionary
# history of a character using the smallest
# number of character-change events. 
#
# Other ways to say this: "The most parsimonious
# character history is the one that minimizes
# the number of changes in the character."
# 
# "The most parsimonious character history is 
# the one that minimizes the number of character
# steps."
#
# If evolutionary changes in the character are
# rare in evolutionary time, which is often
# the case, then parsimony is a reasonable
# way to make an educated guess at the
# evolutionary history of a character.
#
# In future evolution classes, you will learn 
# more about how parsimony works as a computational 
# algorithm, as well as more sophisticated
# alternatives (Maximum Likelihood and 
# Bayesian statistical approaches).
#
# For now, we will just do some intuitive 
# character reconstructions, and compare 
# them to the computer's reconstruction.
#

# Load some character data for: feathered flight
#
# Note: NEXUS format is a common computer
# format for phylogenetic datasets

#######################################################
# START CODE CHUNK #11
#######################################################
# We are putting a big text string inside '',
# then writing it to a temporary text file
# (so that we don't have to mess with people
#  downloading / misplacing files)
tetrapod_feathered_flight_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=19 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    shark           0
    tuna            0
    lungfish        0
    frog            0
    kiwi            0
    seagull         1
    Confuciusornis  1
    Archaeopteryx   1
    Velociraptor    0
    Brontosaurus    0
    crocodile       0
    turtle          0
    wall_lizard     0
    snake           0
    anole_lizard    0
    Tuatara         0
    platypus        0
    opossum         0
    human           0
    ;
END;
'

# Write to a temporary file
datafn = "tmp1_tetrapod_feathered_flight.nexus"
write(x=tetrapod_feathered_flight_nexus_format, file=datafn)

# Load the feathered flight data into a variable
# named "feathered flight"
feathered_flight_ndf = read.nexus.data(file=datafn)
state_names = c("no feathered flight", "feathered flight")
data_levels = c(0, 1)
feathered_flight = phyDat(feathered_flight_ndf, type = "USER", levels=data_levels)
feathered_flight

# Let's plot the character of "feathered flight" at the tips 
# of the tree
state_colors = c("brown", "blue")
nexus_to_tree_tiporder = match(x=names(unlist(feathered_flight_ndf)), table=vert_plusDinos_phylo$tip.label)
colors_to_plot = state_colors[1+as.numeric(unlist(feathered_flight_ndf))[nexus_to_tree_tiporder]]

pdffn = "distrib_feathered_flight.pdf"
pdf(file=pdffn, width=9, height=9)

plot(vert_plusDinos_phylo, label.offset=15)
axisPhylo()
mtext(side=1, text="Ma (millions of years ago)", line=3)

tiplabels(text=NULL, tip=1:length(vert_plusDinos_phylo$tip.label), col=colors_to_plot, bg=colors_to_plot, pch=21, cex=3.5)
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.75)
title("Lab Figure 2: Distribution of the character of 'feathered flight'")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #11
#######################################################

# Q48: Looking at this phylogeny, and the distribution of the 
#      "feathered flight" character, 
#
# (a) How many times do you think feathered flight evolved?  On a
#     paper copy of Figure 2, mark the branch(es) where you think 
#     feathered flight evolved.
#
# (b) How many times do you think feathered flight was lost?  On a
#     paper copy of Figure 2, mark the branch(es) where you think 
#     feathered flight was lost.
#

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=vert_plusDinos_phylo, data=feathered_flight, method="fitch", site="pscore")
parsimony_score

# Q49: The "parsimony score" is the number of changes reconstructed
#      in the character. Does this match your guess based on
#      intuition, above?

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.
#######################################################
# START CODE CHUNK #12
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=vert_plusDinos_phylo, data=feathered_flight, type="MPR", cost=NULL, return="prob")

# Plot to a PDF
pdffn = "feathered_flight_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_plusDinos_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Mapping of ancestral states under Fitch parsimony\n(equal costs).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #12
#######################################################

# Q50: Does the parsimony reconstruction match your reconstruction
#      based on intuition?  Why or why not?

# Q51: Approximately when did feathered flight evolve? (Give an 
#      approximate minimum and maximum date.)

# Taxon sampling (which species are included in the phylogeny)
# can strongly affect ancestral character reconstructions
#
# Let's do the same parsimony analysis, but leaving out 
# the fossil taxa.

parsimony_score = parsimony(tree=vert_phylo, data=feathered_flight, method="fitch", site="pscore")
parsimony_score

#######################################################
# START CODE CHUNK #13
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=vert_phylo, data=feathered_flight, type="MPR", cost=NULL, return="prob")

# Plot to a PDF
pdffn = "feathered_flight_reconstruction_without_fossils.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Mapping of ancestral states under Fitch parsimony\n(excluding fossils).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #13
#######################################################

# Q52: Approximately when did feathered flight evolve, according 
#      to this reconstruction?

#######################################################
# Map the appearance of feathers
#######################################################
# Now, we will map the appearance of feathers
# on the phylogeny. To do this, we have
# to load the "feathers" character, by 
# loading the NEXUS text below.
#
# I have put "0" for all taxa, indicating
# "no feathers".
# 
# Q53: For the fossil taxa that are coded as having
#      feathers, how do we know this? (use google)
#

#######################################################
# START CODE CHUNK #14
#######################################################
tetrapod_feathers_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=19 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    shark           0
    tuna            0
    lungfish        0
    frog            0
    kiwi            1
    seagull         1
    Confuciusornis  1
    Archaeopteryx   1
    Velociraptor    1
    Brontosaurus    0
    crocodile       0
    turtle          0
    wall_lizard     0
    snake           0
    anole_lizard    0
    Tuatara         0
    platypus        0
    opossum         0
    human           0
    ;
END;
'

# Write to a temporary file
datafn = "tmp2_tetrapod_feathers.nexus"
write(x=tetrapod_feathers_nexus_format, file=datafn)

# Load the feathers character into a variable
# named "feathers"
feathers_ndf = read.nexus.data(file=datafn)
state_names = c("no feathers", "feathers")
data_levels = c(0, 1)
feathers = phyDat(feathers_ndf, type = "USER", levels=data_levels)
feathers

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=vert_plusDinos_phylo, data=feathers, method="fitch", site="pscore")
parsimony_score
#######################################################
# END CODE CHUNK #14
#######################################################

# Q54: Does the parsimony score match your guess based on
#      intuition, above?

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.

#######################################################
# START CODE CHUNK #15
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=vert_plusDinos_phylo, data=feathers, type="MPR", cost=NULL, return="prob")

# Plot to a PDF
pdffn = "feathers_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_plusDinos_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Mapping of feathers under Fitch parsimony\n(equal costs).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #15
#######################################################

# Q55: Approximately when did feathers evolve? (Give an 
#      approximate minimum and maximum date.)
# 
# Q56: Did feathers evolve for the function of flight?
#      How do you know? If they didn't, what might
#      have the original function of feathers been?
# 

#######################################################
# Another character: 
# Viviparity (live birth)
#  ...versus...
# Oviparity (lays eggs externally
#######################################################
# The analysis we did with feathers and flight
# can be done with any well-defined character.
#
# One interesting character is "viviparity": the 
# birthing of live young. Animals that lay
# eggs external to the body exhibit "oviparity". 
#
# The popular imagination tends to think "only 
# mammals have live birth". 
#
# This is actually not true: various fishes,
# amphibians, and lizards & snakes have
# evolved viviparity (although, it is 
# true that oviparity is more common, and
# represents the ancestral condition).
#
# But, is it true that live birth is a
# defining characteristic of mammals?
#

#######################################################
# START CODE CHUNK #16
#######################################################
tetrapod_viviparity_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=19 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    shark           0
    tuna            0
    lungfish        0
    frog            0
    kiwi            0
    seagull         0
    Confuciusornis  0
    Archaeopteryx   0
    Velociraptor    0
    Brontosaurus    0
    crocodile       0
    turtle          0
    wall_lizard     0
    snake           0
    anole_lizard    0
    Tuatara         0
    platypus        0
    opossum         1
    human           1
    ;
END;
'

# Write to a temporary file
datafn = "tmp3_tetrapod_viviparity.nexus"
write(x=tetrapod_viviparity_nexus_format, file=datafn)

# Load the feathers character into a variable
# named "viviparity"
viviparity_ndf = read.nexus.data(file=datafn)
state_names = c("oviparity", "viviparity")
data_levels = c(0, 1)
viviparity = phyDat(viviparity_ndf, type = "USER", levels=data_levels)
viviparity

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=vert_plusDinos_phylo, data=viviparity, method="fitch", site="pscore")
parsimony_score
#######################################################
# END CODE CHUNK #16
#######################################################

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.
#######################################################
# START CODE CHUNK #17
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=vert_plusDinos_phylo, data=viviparity, type="MPR", cost=NULL, return="prob")

# Plot to a PDF
pdffn = "viviparity_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_plusDinos_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Mapping of viviparity under Fitch parsimony\n(equal costs).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #17
#######################################################

# Q57: Approximately when did viviparity evolve in mammals?
#      (Give an approximate minimum and maximum date.)
# 
# Q58: Did viviparity evolve at the same time as crown mammals?
#      How do you know? 
# 
# Q59: Five living species of mammals lay eggs. Look them up,
#      and list them here. For each, give the common name,
#      followed by the scientific name (the Latin binomial)
#      WITH CORRECT SPELLING AND FORMAT FOR THE SCIENTIFIC NAMES.
#
#      (a simple guide to scientific names is here:
#      http://entomology.ifas.ufl.edu/frank/kiss/kiss6.htm )
#
#      ...you should follow the formatting rules for scientific
#      names in all future academic work (assignments, exam 
#      essays, etc.)
#

Bonus material: The phylogeny and biogeography of kiwis, moas, and relatives

#######################################################
# PART VI. (BONUS MATERIAL) THE PHYLOGENY AND 
#  BIOGEOGRAPHY OF KIWIS, MOAS, AND RELATIVES
#######################################################
#
# We know that New Zealand and other continental
# blocks used to be connected in the ancient 
# supercontinent of Gondwanaland.  The major
# continental blocks split at approximately
# these times:
#
# Gondwanaland breakup times (all approximate)
# Sources: Gibbs 2016, Yonezawa et al. 2017
# 
# 130 mya - Africa splits from South America
#           (beginning of Atlantic Ocean)
# 130 mya - India + Madagascar break off
#           from western Australia
# 80 mya  - Zealandia splits off from eastern Australia;
#           South America-Antarctica-Australia 
#           remain connected
# ~35 mya - Australia separates from Antarctica
# ~30 mya  - Drake passage opens, separating Antarctica & South America
# 
#
# Vicariance
#
# Speciation caused by geographic separation 
# is known as "allopatric" speciation, and
# allopatric speciation caused by the 
# formation of a barrier due to external 
# factors (e.g., continental breakup)
# is known as "vicariance".
# 
# For many years, the geographic distribution 
# of the flightless ratites has been claimed 
# as a likely example of vicariance caused
# by continental break-up.
#
# Paleognaths and Neognaths
# 
# Living birds are divided into two clades: paleognaths
# and neognaths. The paleognaths include many famous
# flightless birds, including the kiwi, the extinct 
# moas, the Australian emu and cassowary, the 
# South American rhea, the ostrich, and the 
# extinct elephant birds of Madagascar.  The 
# tinamous of South America, which can fly, are
# also in the paleognath clade.
#
# The neognath clade contains all other living birds.
#
# For many years, the flightless paleognaths were
# thought to form a clade, named the ratites, with
# the flying tinamous as an outgroup.
#   
# For example, a 1997 source (Dingus & Rowe 1997) gives
# this cladogram of the ratites, based on an 
# analysis of morphological characters:

#######################################################
# START CODE CHUNK #18
#######################################################
pdffn = "ratite_clado_1997.pdf"
pdf(file=pdffn, width=6, height=6)

ratite_newick_str = "(outgroup,(Tinamidae,(((Rhea,Struthio),(Dromaeus,Casuarius)),(Dinornis,Apteryx))));"
ratite_phylo = read.tree(file="", text=ratite_newick_str)
plot(ratite_phylo, use.edge.length=FALSE)
title("Lab Figure 3: Cladogram of ratite birds\n(after Dingus & Rowe 1997)")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #18
#######################################################

# Q60: On a paper copy of Figure 3, look up the common names of 
#      each tip, and the continent or island of origin, and write them
#      next to the scientific names.
#
# Q61: In the 1997 cladogram, what is the closest relative
#      of the kiwi clade?
#
# Q62: Based on this cladogram, how many times did
#      flightlessness evolve?  Mark the change(s)
#      on a paper version of Figure 3.
#
# Map the flightlessness character on the cladogram
# to check your work:

#######################################################
# START CODE CHUNK #19
#######################################################
# Load the flightlessness data
ratite_flight_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=8 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    outgroup     0
    Tinamidae    0
    Dinornis     1
    Apteryx      1
    Casuarius    1
    Dromaeus     1
    Struthio     1
    Rhea         1
    ;
END;
'

# Write to a temporary file
datafn = "tmp4_ratite_flight.nexus"
write(x=ratite_flight_nexus_format, file=datafn)

# Load the flight data
ratite_flight_ndf = read.nexus.data(file=datafn)
state_names = c("no flight", "flight")
data_levels = c(0, 1)
ratite_flight = phyDat(ratite_flight_ndf, type = "USER", levels=data_levels)
ratite_flight

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=ratite_phylo, data=ratite_flight, method="fitch", site="pscore")
parsimony_score
#######################################################
# END CODE CHUNK #19
#######################################################

#######################################################
# START CODE CHUNK #20
#######################################################
# MPR (Most Parsimonious Reconstruction) of this character
ancestral_states = ancestral.pars(tree=ratite_phylo, data=ratite_flight, type="MPR", cost=NULL, return="prob")
names(ancestral_states)
summary(ancestral_states)

# Plot to a PDF
pdffn = "ratite_flight_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=ratite_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
titletxt = paste0("Mapping of ratite flight under Fitch parsimony\n(equal costs).")
title(titletxt)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #20
#######################################################

#
# A 2017 paper by Yonezawa et al. conducted a 
# comprehensive re-analysis of the phylogeny
# of paleognaths, combining morphological 
# data with DNA from living species and 
# ancient DNA from extinct subfossils of
# moas and elephant birds. This phylogeny
# (following on several other studies)
# suggested a different history of flight
# in the paleognath clade, which in turn
# impacts our ideas about its biogeographical
# history.
#
#

#######################################################
# Plot Yonezawa et al. 2017 paleognath phylogeny
#######################################################
#######################################################
# START CODE CHUNK #21
#######################################################
pdffn = "paleognath_phylo_Yonezawa_etal_2017.pdf"
pdf(file=pdffn, width=9, height=9)

paleognaths_newick_str = "(outgroup:5,((Pseudocrypturus:52.16046818,(Paracathartes_howardae:38.45691818,Lithornis:31.21926818):10.4496):6.0879,((Struthio_camelus:83.16381818,Palaeotis:33.80436818):0.51765,(((((Aepynornis:37.21791818,Mullerornis:37.21791818):28.43295,((Apteryx_australis:6.150518182,(Apteryx_australis_mantelli:4.411718182,Apteryx_rowi:4.411718182):1.7388):7.0245,(Apteryx_owenii:4.503068182,Apteryx_haastii:4.503068182):8.67195):52.47585):4.62945,((Emuarius:4.697318182,Dromaius_novaehollandiae:29.26311818):4.46775,(Casuarius_casuarius:6.644018182,Casuarius_bennetti:6.644018182):27.08685):36.54945):2.90115,((Antarctica_fossil:10.22871818,((Eudromia_elegans:29.69886818,Nothoprocta_perdicaria:29.69886818):1.7976,(Crypturellus:23.76426818,Tinamus_major:23.76426818):7.7322):8.5239):16.1595,((Dinornis_giganteus:10.52796818,Megalapteryx_didinus:10.52796818):2.4024,(Anomalopteryx_didiformis:7.508168182,(Emeus_crassus:7.170068182,Pachyornis_australis:7.170068182):0.3381):5.4222):43.2495):17.0016):1.69785,(Diogenornis:9.764618182,(Rhea_americana:10.24236818,Pterocnemia_pennata:10.24236818):58.36635):6.2706):8.80215):26.2857):5);"
paleognaths_phylo = read.tree(file="", text=paleognaths_newick_str)
plot(paleognaths_phylo)
axisPhylo()
title("Lab Figure 4: Phylogeny of paleognath birds\n(from Yonezawa et al. 2017)")
mtext(side=1, text="Ma (millions of years ago)", line=3)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #21
#######################################################

# Q63: On a paper copy of Figure 3, write the continental/island 
#      locations, and common names (or short descriptions) of the taxa
#      listed at the tips of the phylogeny.
#
# Q64: In the 2017 phylogeny, what is the closest relative
#      of the kiwi clade?
#
# Q65: Compare the 1997 and 2017 phylogenies. Flightless ratites are a
#      clade in the 1997 tree.  Are they a clade in the 2017 tree?
#      If not, are ratites paraphyletic or polyphyeletic?

#######################################################
# START CODE CHUNK #22
#######################################################
paleognath_flight_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=30 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "  0 1";
    MATRIX
    outgroup                    1
    Pseudocrypturus             1
    Paracathartes_howardae      1
    Lithornis                   1
    Struthio_camelus            0
    Palaeotis                   0
    Aepynornis                  0
    Mullerornis                 0
    Apteryx_australis           0
    Apteryx_australis_mantelli  0
    Apteryx_rowi                0
    Apteryx_owenii              0
    Apteryx_haastii             0
    Emuarius                    0
    Dromaius_novaehollandiae    0
    Casuarius_casuarius         0
    Casuarius_bennetti          0
    Antarctica_fossil           0
    Eudromia_elegans            1
    Nothoprocta_perdicaria      1
    Crypturellus                1
    Tinamus_major               1
    Dinornis_giganteus          0
    Megalapteryx_didinus        0
    Anomalopteryx_didiformis    0
    Emeus_crassus               0
    Pachyornis_australis        0
    Diogenornis                 0
    Rhea_americana              0
    Pterocnemia_pennata         0

;
END;
'

# Write to a temporary file
datafn = "tmp4_paleognath_flight.nexus"
write(x=paleognath_flight_nexus_format, file=datafn)

# Load the flightlessness vs. flight data into a variable
# named "paleognath_flight"
paleognath_flight_ndf = read.nexus.data(file=datafn)
state_names = c("no flight", "flight")
data_levels = c(0, 1)
paleognath_flight = phyDat(paleognath_flight_ndf, type = "USER", levels=data_levels)
paleognath_flight

# Let's plot the character of flightlessness vs. flight at the tips 
# of the tree
pdffn = "paleognath_phylo_with_flight_plotted.pdf"
pdf(file=pdffn, width=9, height=9)

state_colors = c("brown", "blue")
nexus_to_tree_tiporder = match(x=names(unlist(paleognath_flight_ndf)), table=paleognaths_phylo$tip.label)
colors_to_plot = state_colors[1+as.numeric(unlist(paleognath_flight_ndf))[nexus_to_tree_tiporder]]
plot(paleognaths_phylo, label.offset=3)
axisPhylo()
mtext(side=1, text="Ma (millions of years ago)", line=3)

tiplabels(text=NULL, tip=1:length(paleognaths_phylo$tip.label), col=colors_to_plot, bg=colors_to_plot, pch=21, cex=2)
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.75)
title("Distribution of the character of 'flight' in paleognaths")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #22
#######################################################

# Q66: Intuitively, what does the Yonezawa et al. phylogeny
#      suggest about the evolution of flightlessness in 
#      the paleognaths?

# See if your intuition agrees with the parsimony reconstruction:

# Parsimony score on the flight/flightlessness character
parsimony_score = parsimony(tree=paleognaths_phylo, data=paleognath_flight, method="fitch", site="pscore")
parsimony_score

# Q67: Does the parsimony score match your guess based on
#      intuition, above?

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.

#######################################################
# START CODE CHUNK #23
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=paleognaths_phylo, data=paleognath_flight, type="MPR", cost=NULL, return="prob")
names(ancestral_states)
summary(ancestral_states)

# Plot to a PDF
pdffn = "paleognath_flight_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=11, height=11)

plotAnc(tree=paleognaths_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Mapping of paleognath flight under Fitch parsimony\n(equal costs)")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.75)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28.5, labels=parsimony_txt, pos=4, cex=0.75)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #23
#######################################################

# 
# Q68: Under this reconstruction, how many times was 
#      flight lost?
#
# Q69: Under this reconstruction, how many times was
#      flight gained?
#
# Q70: Does this parsimony reconstruction of the evolution of 
#      flight/flightlessness agree with your intuition?
# 
# Q71: Do you think it makes sense to think that it is 
#      equally easy for a lineage to lose flight as to 
#      gain flight?
#
#
# Let's run another parsimony analysis, but one where
# regaining flight "costs" 10 times as much as losing
# flight.  (Here the parsimony "cost" is the number of
# steps counted per change.)

########################################################
# What if loss-of-flight is irreversible?
########################################################

#######################################################
# START CODE CHUNK #24
#######################################################
cost_of_losing_flight = 1
cost_of_regaining_flight = 10

# Set up a 2x2 cost matrix
cost_matrix = matrix(data=c(0,cost_of_losing_flight,cost_of_regaining_flight,0), ncol=2, byrow=TRUE)
colnames(cost_matrix) = c("flight","flightless")
rownames(cost_matrix) = c("flight","flightless")

# Look at the cost matrix
cost_matrix

# Now let's calculate a parsimony score, using "Sankoff" parsimony
# (Before this, we have been using "Fitch" parsimony, which assumes
#  equal costs for all changes. Under Sankoff parsimony, the user
#  assigns a cost matrix.)
parsimony_score = parsimony(tree=paleognaths_phylo, data=paleognath_flight, method="sankoff", site="pscore", cost=cost_matrix)
parsimony_score
#######################################################
# END CODE CHUNK #24
#######################################################

# Q72: What is the parsimony score under this cost matrix?

#######################################################
# START CODE CHUNK #25
#######################################################
# Reconstruct the ancestral states under this cost matrix
ancestral_states = ancestral.pars(tree=paleognaths_phylo, data=paleognath_flight, type="MPR", cost=cost_matrix, return="prob")

pdffn = "paleognath_ancestral_flight_under_Sankoff_parsimony.pdf"
pdf(file=pdffn, width=11, height=11)

plotAnc(tree=paleognaths_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
titletxt = paste0("Mapping of ancestral states under Sankoff parsimony\n(unequal costs). Cost of regaining flight=", cost_of_regaining_flight)
title(titletxt)

axisPhylo()
mtext(text="Millions of years ago (mega-annum, Ma)", side=1, line=2)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #25
#######################################################

# Q73: Under Sankoff parsimony with this cost matrix, how many times
#      was flight gained?
#
# Q74: How many times was flight lost?
#
# Q75: What does this reconstruction suggest about the vicariance
#      explanation for the distribution of the ratites?  (You
#      don't have to have a "final" answer, these questions are
#      still being actively researched.)
#

References

###################################################################
# REFERENCES
#
# Chiari, Ylenia; Cahais, Vincent; Galtier, Nicolas; 
# Delsuc, Frédéric (2012). "Phylogenomic analyses support the 
# position of turtles as the sister group of birds and 
# crocodiles (Archosauria)." BMC Biology, 10:65.
# https://doi.org/10.1186/1741-7007-10-65
# 
# Dingus, Lowell; Rowe, Timothy (1998). The Mistaken 
# Extinction & CD-Rom: Dinosaur Evolution and the Origin 
# of Birds (Academic Version), 1st Edition. W. H. Freeman, 
# pp. 1-332 + CD-ROM.  (ratite cladogram on CD-ROM)
#
# Gibbs, George C. (2016). Ghosts of Gondwana: The History of Life 
# in New Zealand. Fully revised edition. Potton & Burton, pp. 1-367.
#
# Ihaka, Ross; Gentleman, Robert (1996). "R: A Language 
# for Data Analysis and Graphics." Journal of 
# Computational and Graphical Statistics. Volume 5,
# Issue 3, pages 299-314.
# https://amstat.tandfonline.com/doi/abs/10.1080/10618600.1996.10474713
# 
# Yonezawa, Takahiro, et al. (2017). Phylogenomics and Morphology
# of Extinct Paleognaths Reveal the Origin and Evolution of the 
# Ratites. Current Biology, 27, 68–77. 
# https://www.cell.com/current-biology/fulltext/S0960-9822(16)31214-3
###################################################################
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License