Tree Thinking With R

Introduction

This the code for a lab for the University of Auckland's BioSci 109 course introducing Ecology and Evolution. All code should run "out of the box" by copying-and-pasting into R. (You may prefer to copy-paste the code into RStudio's script editing window, as this color-codes the code, making it easier to read. Each code chunk can then be executed by highlight it and clicking "run", or hitting something like command-enter (depends on keyboard/operating system).

Problem opening PDFs?

This R code puts each figure into a PDF file. It then tries to automatically open the PDF for viewing, using the R code system(cmdstr), where cmdstr just contains the string of letters containing the command "open <filename>". On most Windows and Mac machines, this will open the PDF in the default PDF viewer, typically Adobe Acrobat or Adobe Reader.

The R code puts the graphics into PDFs so that the graphics will look identical regardless of screen size, operating system, etc.

However, on some of the lab computers, the version of Adobe Acrobat that is installed seems to have expired. The program asks you to create an Adobe account, yadda yadda, which is complex and may not work anyway. The easiest workaround seems to be to

(a) find the PDF file in your home directory (type getwd() in R to see what your working directory, type Windows-Key-E to open the File Explorer in Windows), and

(b) drag the PDF file into the address bar of your Chrome browser

We have messaged tech support to try and fix the Adobe Acrobat issue.

IF ALL ELSE FAILS: All of the Figures, labeled by code chunk, are available in a single PDF in "Files", at the bottom-right of this webpage. If getting the code to work is too difficult, or time constraints are too tight, you are free to download and use this PDF to answer the study questions.

Code - Introduction to Tree-Thinking in R, using phylogenies of tetrapods

#######################################################
# BIOSCI 109 Lab: Introduction to Tree-Thinking in R,
# using phylogenies of tetrapods
# 
# by: Nick Matzke, 2019
# School of Biological Sciences
# University of Auckland
#######################################################

#######################################################
# PART I.   SET-UP
# PART II.  ULTRA-SHORT CRASH COURSE IN R
# PART III. INTRODUCTION TO TREES IN R
# PART IV.  VERTEBRATE PHYLOGENY
# PART V.   MAPPING CHARACTER EVOLUTION ON A PHYLOGENY:
#           FEATHERED FLIGHT AND VIVIPARITY
# PART VI.  BONUS MATERIAL (OPTIONAL): THE PHYLOGENY AND 
#           BIOGEOGRAPHY OF KIWIS, MOAS, AND RELATIVES
#######################################################

#######################################################
# Summary:
#
# In this lab we will learn about "tree-thinking" -- 
# how to read evolutionary trees, and how to think 
# about evolution in terms of different depictions
# of evolutionary trees.
# 
# The lab will do this using "R", an open-source 
# statistical computing language that is now used 
# in science and statistics worldwide. NO PREVIOUS
# KNOWLEDGE OF R IS REQUIRED -- all of the commands
# used here can be copy-pasted from this script
# into R.  
#
# NOTE: Phylogenetics, and R, are two of the areas 
# for which New Zealand and the University of 
# Auckland are well-known in science. For example,
# the original publication of R, Ihaka & 
# Gentleman (1996), now has over 10,000 citations
# in the scientific literature. The authors, Ross
# Ihaka and Robert Gentleman, are from the 
# Department of Statistics at the 
# University of Auckland.
#
# Assessment: 
# 
# All tree-thinking material (the specific 
# trees, the conceptual understanding of trees, etc.) 
# WILL BE FAIR GAME for assessments in this course,
# and it will also be assumed knowledge for future
# courses involving evolution.
#
# The "R" material WILL NOT be assessed -- however,
# learning R can be a very useful skill for data
# analysis, scientific graphics, and statistics.
# Hopefully, by getting a taste of R here, you will 
# be encouraged to develop your R skills in future
# courses and projects. (Note: knowing R can help
# you get a job!)
# 
#######################################################

##############################################################
# LICENSE:
# 
# Free to use/redistribute under:
# Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) 
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the above license, linked here:
# 
# http://creativecommons.org/licenses/by-nc-sa/3.0/
# 
# Summary:
#
# You are free:
#
#   * to Share -- to copy, distribute and transmit the work
#   * to Remix -- to adapt the work
#
# Under the following conditions:
#
#   * Attribution -- You must attribute the work in the manner 
#     specified by the author or licensor (but not in any way that 
#     suggests that they endorse you or your use of the work).
#   * Noncommercial -- You may not use this work for commercial purposes. 
#
#   * Share Alike -- If you alter, transform, or build upon this work,
#     you may distribute the resulting work only under the same or
#     similar license to this one. 
#
# http://creativecommons.org/licenses/by-nc-sa/3.0/
# 
###################################################################

###################################################################
#
# Please link/cite if you use this, please email me if you have 
#  thoughts/improvements/corrections: n.matzkeATauckland.ac.nz
#
# Citation: Matzke, Nicholas J. (2019). Introduction to 
# Tree-Thinking in R, using phylogenies of tetrapods and 
# paleognath birds. Lab for BIOSCI 109 at the University of 
# Auckland, New Zealand.
#
###################################################################

###################################################################
# PART I. SET-UP
###################################################################
# 
# A. Installation of R
# 
# Your lab computers should already have R (or RStudio) installed. 
# Find it and open it. 
# 
# If you are doing the lab on your own laptop or another machine, 
# you will have to install R first.  Because R is free and open-
# source, this is easy.
# 
# Two options:
# A1. Basic (terminal) R, or R.app for Macs:
#     Go to: https://cloud.r-project.org/ 
# 
# A2. RStudio:
#     https://www.rstudio.com/products/rstudio/download/
# 
# B. Installation of R packages
# 
# R "packages" allow scientists to perform specialist analyses.
# R packages are also open-source and shared freely worldwide.
# Here, we will be using 2 packages designed for phylogenetics:
#
# 1. APE ("Analysis of Phylogenetics and Evolution")
# 2. phangorn (yes, this is a nerd pun)
# 
# These packages are already installed on the lab computers.
# On other computers, you can install them by typing these 
# commands in R:
#
# (remove the "#", the comment character, to run the commands)
# 
# install.packages("ape")
# install.packages("phangorn")
# 
###################################################################

###################################################################
# PART II. ULTRA-SHORT CRASH COURSE IN R
###################################################################
# R has a "command-line interface", rather than a "point-and-click"
# interface. Although "point-and-click" is easier at first, 
# command-line is better for science, because:
#
# (a) you can save all your commands in a text file, and repeat 
# the analysis easily by copying and pasting the saved commands; 
# this is much easier than trying to remember all the clicks you 
# did, in what order.
#
# (b) these saved commands constitute an "R script" that you can 
# share with other scientists, who can modify and improve the 
# script. This lab is in the form of an R script.
# 
# (c) this is an easy form of beginning programming; eventually,
# it leads to writing your own programs and R packages, thus 
# earning fame and fortune.
# 
# We will not attempt a complete introduction to R here, just 
# the absolute basics. 
#
# COMMENTS
# 
# Any line that starts with a "#" is a comment. Comments, when
# pasted into R, do nothing.
# 
# Comments are where the script describes what is happening.
# Well-commented code is valuable because other users, or 
# your future self, can more easily figure out what each 
# piece of code is doing.
# 
# BASIC MATH
# 
# Type these commands into R to see R do math:
1+1
30/3
5*3

# STORING AND RETRIEVING VARIABLES
# 
# Variables can store numeric data
# Typing the name of the variable returns what is stored:
var1 = 1+1
var1

# Variables can also store character data
var2 = "Hello, world!"
var2

# LOGICAL TESTS
#
# The "==" operator tests if two things are equal,
# and returns TRUE or FALSE
1 == 1
var1 == 1
var1 == 2

# R can also do greater than, less than, not equal, etc.
1 < 2
2 > 1
1 != 1
1 != 2

# VARIABLES STORING ARRAYS
# A variable can store multiple items; this is a "vector"
# or "array".  Let's store a list of species names:
var3 = c("human", "chimp", "gorilla", "orangutan")
var3

# Items in an array can be accessed with square brackets: [ ]
var3[1]
var3[2]
var3[3]
var3[4]

# FOR-LOOPS
# A "for-loop" can be used to loop through an array, for
# example to print or to search for a match:

# ( copy-and-paste from "for" to "}" )
# Print each item in var3:
for (i in 1:4)
    {
    print(var3[i])
    }

# PRINT ONLY MATCHES, WITH IF-ELSE STATEMENT
# * "If-else" statements allow the code to
#   perform an action, *if* a particular
#   TRUE/FALSE statement is true. 
#
# * Here, the variable "i" iterates through
#   1, 2, 3, and 4.  For each i, var3 is 
#   checked to see if it equals "human". 
#   If so, it prints "match found". If not,
#   it prints "match not found".
#
# * If-else statements allow more complex 
#   analyses.
#
# ( copy-and-paste from "for" to the last "}" )
for (i in 1:4)
    { 
    print(var3[i])
    if (var3[i] == "human")
        {
        print("match found!")
        } else {
        print("match not found!")
        }
    }

# FUNCTIONS
#
# Functions are pre-written code that takes an input
# and produces an output.  All functions end with 
# round brackets -- "(" and ")" -- which is where
# the inputs go.
# 
# We saw an example already, with the print() 
# function:
print(var3)
print(var3[1])
print(var3[3])

# Many R functions are statistical. For example,
# random number generation:

# Draw a random number from a normal distribution
# with a mean of 1.0, standard deviation of 0.1
rnorm(n=1, mean=1.0, sd=0.1)

# Do it several times
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)
rnorm(n=1, mean=1.0, sd=0.1)

# Generate 100 such numbers and store in var4
var4 = rnorm(n=100, mean=1.0, sd=0.1)

# R functions can also do statistical plots
# Here is a histogram plot of var4

# Looking at var4 as a list of nubmers is difficult
var4

# But, looking at a histogram is easy:
hist(var4)

# Repeat the histogram with 10,000 samples from 
# a normal distribution. Does this look
# closer to the theoretical expectation for
# a normal distribution (a "bell curve")?
var5 = rnorm(n=10000, mean=1.0, sd=0.1)
hist(var5)

# LIBRARY-ING R PACKAGES, TO ACCESS MORE FUNCTIONS
# 
# R packages store functions for specific scientific
# purposes.  For example, just like R can generate
# random numbers, R can generate random phylogenetic
# trees, with the "rtree()" function:

# Paste into R:
tr = rtree(n=10, rooted=TRUE)

# Oops, that produced an error! The "rtree" (random tree)
# function is part of the "ape" package. So we have to 
# library() the ape package first:

library(ape)
tr = rtree(n=10, rooted=TRUE)

# No error that time, since we did "library(ape)".
# Type "tr" to see what is in the variable "tr":
tr

# That's a tree with 10 tips!  Let's see what it 
# looks like, with the plot() function:
plot(tr)

# Let's add a title
title("This is a randomly-generated phylogenetic tree, with 10 tips.")

# Let's make a bigger tree, with 40 tips:
tr = rtree(n=40, rooted=TRUE)
plot(tr)
title("This is a randomly-generated phylogenetic tree, with 40 tips.")

##########################################################
# CONCLUSION TO R CRASH COURSE
#
# That's enough basic R for now.  The rest of the script 
# will use various R functions to load and display 
# phylogenies. Your job will be to copy-and-paste
# the commands, look at the tree graphic, and 
# learn about the different ways phylogenetic trees
# are displayed, interpreted -- and mis-interpreted!
##########################################################

##########################################################
##########################################################
# HELPFUL R TRICKS
#
# 1. UP ARROW -- In RStudio and most other R platforms, 
#    hitting the "up" arrow on your keyboard will bring
#    back your previous commands.  This can be very 
#    helpful for saving typing, going back to an earlier
#    step, etc.
#
# 2. Control-F or Command-F -- These keys are a shortcut
#    for "Find" in many programs.  They let you search 
#    a webpage, text file, PDf, or Word file for a 
#    specific character string. This can be useful to 
#    find a specific question in a large document. E.g.
#    to find Q32 in this long R script, type Control-F
#    and then type "Q32". (whether it is Control-F or
#    Command-F depends on the operating system. Typically
#    on Windows, it is Control-F).
#
# 3. dev.off() -- Many of the plots below will be done by:
#    a. Opening a PDF file for writing, with the "pdf"
#       command.
#    b. Sending graphics into the PDF with the "plot"
#       command
#    c. Stopping writing the PDF file with "dev.off()"
#    d. Opening the PDF for viewing with "system(cmdstr)"
#
#    If you accidentally skip step (c), all future plotting
#    commands will be adding to the PDF opened in (a), 
#    rather than into new PDFs. This can be very confusing,
#    and produce error messages about broken PDFs etc.
#    If it seems like this is happening:
#      * run "dev.off()" several times to close any 
#        writing-open pdfs and any R graphics windows.
#      * Also, manually close (with clicking) any 
#        graphics PDFs you are viewing, so that you
#        can see the new versions when they are made/opened.
##########################################################
##########################################################

#######################################################
#######################################################
# INSTRUCTIONS: COPYING-AND-PASTING CODE CHUNKS
#######################################################
#######################################################
#
# For small commands, it is fine to paste in the
# commands one, or a few, lines at a time.
#
# However, for larger plots, it is best to paste 
# in "chunks" of code. To make it clear where to
# do this, we have labeled the "code chunks" like this:
# 
#######################################################
# START CODE CHUNK #1
#######################################################
#
# (some R code)
#
#######################################################
# END CODE CHUNK #1
#######################################################
#
# When you see a code chunk, copy-and-paste the 
# whole chunk at once.
#
# There are 17 labeled code chunks in the lab exercise,
# or 25 if the bonus material in part 6 is included.
#######################################################

#######################################################
# PART III. INTRODUCTION TO TREES IN R
#######################################################

# Let's make sure our R packages are library()-ed:
library(ape)
library(phangorn)

# A. Phylogenies: Basic terminology

#######################################################
# START CODE CHUNK #1
#######################################################

# Let's load a simple phylogeny
great_ape_newick_string = "(((human:6,chimpanzee:6):1,gorilla:7):5,orangutan:12);"
great_ape_phylogeny = read.tree(file="", text=great_ape_newick_string)
great_ape_phylogeny

# And plot the phylogeny to a PDF

# Open the PDF for writing
pdffn = "chunk01_Fig1_ape_phylo.pdf"
pdf(file=pdffn, width=6, height=6)

# Send the plot to the PDF
plot(great_ape_phylogeny)
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Chunk 1, Fig.1: Simple phylogeny of the great apes")

# Close the writing of the PDF, open for viewing
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #1
#######################################################

# Terminology: 
# Phylogenies have several parts:
# 1. Nodes. There are 3 kinds of nodes:
#    a. "Tip node", or "leaf" -- these are the "tips" of 
#       the tree. Each tip represents a species.
#    b. "Internal node" -- these are the branching 
#       points of the tree. Each represents a 
#       speciation event.
#    c. "Root node". This is the internal node at the 
#       "bottom" of the tree, i.e. the "roots" of
#       the tree.
# 
# 2. Branches or edges. The branches are the lines
#    connecting the nodes.
#    a. As you can see, some branches have different
#       lengths. However (KEY POINT), whether or not
#       the displayed branch lengths are meaningful
#       depends on what kind of phylogeny is being 
#       displayed.
#    b. In CLADOGRAMS, the branch lengths are 
#       not meaningful. Cladograms just display
#       the hierarchical pattern of relationships
#       between species, also known as the 
#       "tree topology".  In cladograms, 
#       the branches may differ in length, but 
#       this is done only for graphical 
#       convenience.
#    b. In PHYLOGRAMS, the branch lengths are 
#       meaningful. The branch lengths can
#       represent:
#       - Amount of molecular change (e.g., the
#         amount of DNA change)
#       - Amount of morphological change (e.g.,
#         the number of changes in discrete
#         characters)
#       - Amount of time elapsed. 
#
# 3. Tip names, or "tip labels". Typically tip
#    names are the common or scientific names of
#    the species being studied, and the tree is a 
#    tree of species, but tips can also be 
#    gene names, specimen names, populations,
#    languages, or anything else that one 
#    might want to study the phylogenetic
#    history of.
#
# 4. Phylogenetic graphics are scientific figures,
#    often used in the figures in textbooks, and
#    in scientific publications. As such they 
#    often have additional information beyond
#    just the nodes, branches, and tip labels
#    of the phylogeny. These can include:
#    
#    a. Scale bar or timescale, showing the 
#       units of branch length
#    b. Axis label
#    c. Descriptive title.
# 
# Now, let's answer some basic questions, using 
# the information above, and the tree you have
# plotted.
#
# Chuck 1, Q1: Is the phylogeny we are looking at a
#     phylogram or cladogram?
#
# Chuck 1, Q2: Which species are most-closely related 
#     in this phylogeny?
#
# Chuck 1, Q3: Can a cladogram have a scale bar?
#
# Chuck 1, Q4: How many clades are shown in this tree?
#
#
# R assigns each node a node number, in order
# to keep track of the tree structure in the 
# computer. We can
# display these node numbers on the plot. Run
# this code and answer the following questions.

#######################################################
# START CODE CHUNK #2
#######################################################
pdffn = "chunk02_Fig2_apes_node_numbers.pdf"
pdf(file=pdffn, width=6, height=6)

# Plot the phylogeny
plot(great_ape_phylogeny)
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Chunk 2, Fig. 2: Simple phylogeny of the great apes\n(with node numbers)")

# Add the node numbers to the plot
nodelabels()
tiplabels()

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #2
#######################################################

# Chunk 2, Q5: Give the node numbers of the tip nodes
# Chunk 2, Q6: Give the node numbers of the internal nodes
# Chunk 2, Q7: Give the node number of the root node
#
# Chunk 2, Q8: Give the age of each of the 7 nodes
#
# Chunk 2, Q9:  Which node represents the split between 
#      humans and chimps?
# Chunk 2, Q10: Which node represents the split between 
#      humans and gorillas?
# Chunk 2, Q11: Which node represents the split between 
#      gorillas and chimps?

# R also assigns numbers to the branches/edges.
#    Let's plot those.

#######################################################
# START CODE CHUNK #3
#######################################################
# Plot the phylogeny

pdffn = "chunk03_Fig3_apes_branch_numbers.pdf"
pdf(file=pdffn, width=6, height=6)

plot(great_ape_phylogeny)
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Chunk 3, Fig. 3: Simple phylogeny of the great apes\n(with branch numbers)")

# Add the branch numbers to the plot
edgelabels()

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #3
#######################################################

# Chunk 3, Q12: Which branch is longest?  How long is it?
#      (include the units in your answer)
# Chunk 3, Q13: Which branch is shortest?  How long is it?
#      (include the units in your answer)
# Chunk 3, Q14: Why are there 6 branches, and not 9 or 12?
#      Displaying the same tree in another way
#      might help you answer (see code chunk 4):

#######################################################
# START CODE CHUNK #4
#######################################################
pdffn = "chunk04_Fig4_apes_diagonal.pdf"
pdf(file=pdffn, width=6, height=6)

plot(great_ape_phylogeny, type="c")
axisPhylo()
mtext(side=1, text="Millions of years ago", line=2)
title("Chunk 4, Fig. 4: Simple phylogeny of the great apes,\ndiagonal branch view")
edgelabels()

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #4
#######################################################

# Cladograms versus Phylograms
# Let's compare 2 versions of the great ape phylogeny:
#######################################################
# START CODE CHUNK #5
#######################################################
great_ape_newick_string1 = "(((human:6,chimpanzee:6):1,gorilla:7):5,orangutan:12);"
great_ape_newick_string2 = "(((human,chimpanzee),gorilla),orangutan);"

# And, let's plot them both:
great_ape_phylo1 = read.tree(file="", text=great_ape_newick_string1)
great_ape_phylo2 = read.tree(file="", text=great_ape_newick_string2)

# Plot 2 graphics on same page:
dev.off()   # (closes previous graphics windows)

# Plot into a PDF
pdffn = "chunk05_Fig5ab_compare_2_great_ape_phylogenies.pdf"
pdf(file=pdffn, width=8.5, height=11)

par(mfrow=c(2,1))
plot(great_ape_phylo1)
title("Chunk 5, Fig. 5a: Tree from great_ape_newick_string1")

plot(great_ape_phylo2)
title("Chunk 5, Fig. 5b: Tree from great_ape_newick_string2")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #5
#######################################################

# Chunk 5, Q15: Which tree is a cladogram?
#
# Phylogenetic trees are displayed in R, but they are 
# stored in a simple, textual format, called
# parenthetical or "newick" format.  In R, chunks
# of text are called "strings" of characters, so
# I called the variables "great_ape_newick_string1"
# and "great_ape_newick_string2". 
#
# Chunk 5, Q16: Look at the two newick_strings. What is the difference
#      between them? How does that connect to the difference
#      between a cladogram and a phylogram?
#
# Chunk 5, Q17: The "newick" format for phylogenies was invented
#      by some evolutionary biologists in the 1980s. 
#      Google the origin of the name "newick" -- where
#      does the name come from?
#

###################################################################
# PART IV. VERTEBRATE PHYLOGENY
###################################################################

# Now we will explore a more complex phylogeny: a backbone
# tree of vertebrates, focusing on the tetrapods.  Again, 
# in addition to the tree-thinking
# skills, LEARN THE STRUCTURE OF THE VERTEBRATE PHYLOGENY. YOU
# SHOULD BE ABLE TO DRAW IT FROM MEMORY (meaning, know the cladogram;
# you should have a general idea of the divergence times, but 
# these are only known approximately and are still debated to 
# some extent). The vertebrate cladogram is basic 
# knowledge for all of your future work in biology and evolution.
# 
# This phylogeny is simplified from Chiari et al. (2012). 
# 

#######################################################
# START CODE CHUNK #6
#######################################################

# Get the Newick string and load the tree.
dev.off()   # (closes previous graphics windows)

pdffn = "chunk06_Fig6_compare_2_great_ape_phylogenies.pdf"
pdf(file=pdffn, width=9, height=9)

vert_newick_str = "(shark:471,(tuna:432,(lungfish:416,(frog:352.6457263,(((((kiwi:81.06707278,seagull:81.06707278):132.1992512,crocodile:213.266324):38.01256425,turtle:251.2788883):20.2733676,((wall_lizard:134.7070246,(snake:127.5268735,anole_lizard:127.5268735):7.180151025):85.293,Tuatara:220):51.5522313):42.65854432,(platypus:187.9246145,(opossum:147.3778793,human:147.3778793):40.5467352):126.2861857):38.43492607):63.35427375):16):41);"
vert_phylo = read.tree(file="", text=vert_newick_str)
plot(vert_phylo)
axisPhylo()
title("Chunk 6, Fig. 6: Backbone phylogeny of vertebrates")
mtext(side=1, text="Ma (millions of years ago)", line=3)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #6
#######################################################

# Answer these basic questions:
# 
# Chunk 6, Q18: Is this a phylogram or cladogram?
#
# Chunk 6, Q19: In the term "tetrapod", what do "tetra" and "pod" mean?
#
# Chunk 6, Q20: Does everything in the tetrapod group fit this 
#      description, or does the term describe 
#      the inferred common ancestor of the group?
# 
# Chunk 6, Q21: Approximately when did tetrapods begin to diversify?
# 
# Chunk 6, Q22: I have put an example species of each major group at the tips.
#      Obviously, this phylogeny does not contain all of the species 
#      in these groups.  On a paper copy of Figure 6, write each
#      group name next to the tip with its representative species:

# Names corresponding to a single tip in this tree:
# Actinopterygii
# Amphibia
# Serpentes
# Sphenodon
# Testudines
# Crocodilia
# Paleognathae
# Neognathae
# Placentalia
# Monotremata
# Marsupalia
# Chondrichthyes
# Neoceratodus
# 
# Name corresponding to 3 tips in the tree
# Squamates
#
#
# Chunk 6, Q23: For each of the groups above, look up the 
#      approximate diversity (=number of species)
#      in the group, and write it next to the group
#      name.
#
# Chunk 6, Q24: Based on the phylogeny and further reading, 
#      why is New Zealand's tuatara much 
#      famous than most other reptiles?
# 
# Chunk 6, Q25: Based on the phylogeny and further reading, 
#      why are Australia's platypus and echidna 
#      more famous than most other mammals?
# 
# Chunk 6, Q26: Based on the phylogeny and further reading, 
#      why is Australia's lungfish more famous than
#      most other fishes?

# Polyphyletic groups versus monophyletic groups (=clades).
#
# Taxonomy is the science of classifying organisms. A taxon
# (plural: taxa) is the name of a group of organisms. These 
# can be a species name, genus name, some other Linnaean 
# ranked taxon (order, class, etc.), or some unranked group 
# name.
#
# Linnaean taxonomy, which goes back to Linnaeus in the 1700s,
# relies on a classification of groups-within-groups: species
# are grouped into genera, genera into families, etc. These
# groupings were typically done based on overall similarity of 
# shared characteristics.
#
# Phylogenetic systematics has evolved in recent decades, and
# a key idea of phylogenetic systematics is that taxa should
# be monophyletic groups, a.k.a. clades.  A monophyletic taxon
# includes a common ancestor and all of its descendants. If 
# a proposed taxon includes leaves out some of the 
# descendants of the common ancestor, it is paraphyletic. If
# a proposed taxon leaves out the common ancestor, it is
# polyphyletic.
#
# Phylogenetic systematists prefer monophyletic taxa, because
# they are evolutionarily "real" -- they refer to coherent
# groups on the Tree of Life (the phylogeny), which should
# be discoverable by independent investigators using 
# independent datasets.  Non-monophyletic groups are to some
# extent arbitrary -- typically based on key characteristics
# that humans find interesting, or overall appearance.
#
# During a phylogenetic analysis, researchers sometimes
# discover Linnaean ranked taxa to be 
# monophyletic, and sometimes para- or polyphyletic.
# Typically, monophyletic taxa are kept, and non-
# monophyletic taxa are replaced with new names representing
# proposed clades.  Some scientists prefer to keep the
# Linnaean ranked system (but modifying it to make all
# groups monophyletic), some argue we should abandon
# the Linnaean ranks (genus, family, etc.) and just
# use unranked clade names. This debate is ongoing, so
# you need to be aware that different scientific authors
# take different positions.
#
# Here, we will practice our skills at identifying 
# monophyletic, polyphyletic, and paraphyletic groups.
# 

# Chunk 6, Q27: Using brackets, label these traditional groups 
#      (i.e., as understood in popular culture) on the 
#      paper copy of the "Figure 6" tree:
# 
# Names corresponding to one or more tips in the tree:
#
# mammals
# reptiles
# birds
# amphibians
# fishes

# Chunk 6, Q28: Of the five groups, name the groups that are
#      are not monophyletic with respect to the other 
#      four.
# 
# Chunk 6, Q29: For the non-monophyletic groups, are they 
#      polyphyletic or paraphyletic?
#
# Chunk 6, Q30: For each non-monophyletic group, which
#      other groups would have to be included
#      to make them monophyletic?
#
# Chunk 6, Q31: If a scientist says, "phylogenetically
#      speaking, birds are just a subgroup of
#      reptiles", what do they mean?
#
# Chunk 6, Q32: If a scientist says, "phylogenetically
#      speaking, tetrapods are just a subgroup
#      of fishes", what do they mean?
# 
# Chunk 6, Q33: If a taxon "fishes" is defined as a 
#      monophyletic group, are humans then
#      just an interesting type of fish,
#      phylogenetically speaking?
#  
# Chunk 6, Q34: If a taxon "reptiles" is defined as a 
#      monophyletic group, are humans then
#      just an interesting type of reptile,
#      phylogenetically speaking?
# 
# Chunk 6, Q35: If a taxon "reptiles" is defined as a 
#      monophyletic group, are birds then
#      just an interesting type of reptile,
#      phylogenetically speaking?

#
# Let's add some dinosaurs to this tree:
# 
#######################################################
# START CODE CHUNK #7
#######################################################
vert_plusDinos_newick_str = "(shark:471,(tuna:432,(lungfish:416,(frog:352.6457263,(((((((((kiwi:71.06707278,seagull:71.06707278):60,Confuciusornis:5):20,Archaeopteryx:5):7.1992512,Velociraptor:80):30,Brontosaurus:50):25,crocodile:213.266324):38.01256425,turtle:251.2788883):20.2733676,((wall_lizard:134.7070246,(snake:127.5268735,anole_lizard:127.5268735):7.180151025):85.293,Tuatara:220):51.5522313):42.65854432,(platypus:187.9246145,(opossum:147.3778793,human:147.3778793):40.5467352):126.2861857):38.43492607):63.35427375):16):41);"
vert_plusDinos_phylo = read.tree(file="", text=vert_plusDinos_newick_str)

pdffn = "chunk07_Fig7_backbone_dinos_wFossils.pdf"
pdf(file=pdffn, width=9, height=9)

plot(vert_plusDinos_phylo)
axisPhylo()
title("Chunk 7, Fig. 7: Backbone phylogeny of vertebrates,\nsome dinosaurs added")
mtext(side=1, text="Ma (millions of years ago)", line=3)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #7
#######################################################

# Chunk 7, Q36: When someone says, "Actually, birds are
#      a type of dinosaur", what do they mean?
# 
# Chunk 7, Q37: The smallest-known dinosaur is a species
#      living today. What is it? (use google)
#
# Chunk 7, Q38: What two groups of tetrapods living today can fly?
#
# Chunk 7, Q39: If "flying tetrapods" was proposed as a taxon, 
#      would this be monophyletic, paraphyletic, or
#      polyphyletic? Why?
# 

#######################################################
# Crown groups and stem groups
#
# As our ability to infer phylogenies has been improved,
# it has become useful to distinguish between "crown"
# and "stem" groups. These terms are most useful for 
# fossils, which often are related to, but fall outside
# of, a living clade ("the crown").
#
# 
# Definitions:
# 
# To define a "crown group", pick 2 living taxa, and go back
# to their common ancestral node. Anything descending from
# this node is in the crown group.
# 
# A "stem group" is any group that falls outside of this
# crown.
# 
# This distinction is useful because it avoids trying to solve
# the problem of what defines "a bird", "a mammal", etc.
# Because evolution is slow and gradual, the characters found
# in these groups assembled gradually and not all a once. 
# 
# For example: all living birds descend from a common ancestor
# that was warm-blooded, flew, had feathers, etc.  However, we
# find various fossils that have some, but not all, of the
# characteristics of living birds. These are "transitional
# fossils".  Should we call them "birds", "bird-like
# dinosaurs", "dinobirds", or what?  We can pick a particular
# characteristic, but this is arbitrary and sometimes produced
# polyphyletic groups.
# 
# The crown/stem distinction resolves this issue: fossils like
# Archaeopteryx do not fall inside of crown birds. They are
# stem groups, on the stem of crown birds. The crown group
# they fall into is defined by the common ancestor of
# crocodylians and living birds.
# 
#
# Chunk 7, Q40: Here, the two fossil birds are stem groups of
#      what extant crown clade? 
#      (Hint: Google "crown group of birds". Yes, the 
#       Wikipedia article on crown groups is accurate 
#       as of May 7, 2019).
#
# Chunk 7, Q41a: What taxa in your tree represent the 
#       smallest living crown clade that includes the
#       dinosaurs?
# 
# Chunk 7, Q41b: What is the name of this group?
#       (You may have to google this!) (Hint: the 
#        crown clade here is defined by the common 
#        ancestor of 2 living groups: crococylians 
#        and birds. So, google "crown group crocodiles 
#        birds.")
#######################################################

#######################################################
# Tree-thinking: avoiding notions of "progress" and 
# "higher" versus "lower"
#######################################################
#
# Going back to the ancient Greeks, it has been 
# common to think of life as a "Great Chain of 
# Being", or a ladder.  It looks something like
# this:
#
# gods
# angels
# humans
# apes
# mammals
# reptiles
# amphibians
# fishes
# other squishy things ("lower animals")
# plants
# rocks
# 
# This way of thinking is so deeply embedded that
# it often creeps into both popular and scientific
# discussions of evolution and phylogeny. I even
# did it by accident in our vertebrate phylogeny.
# 
# Let's display the phylogeny in the way it is 
# often shown in textbooks:

#######################################################
# START CODE CHUNK #8
#######################################################
pdffn = "chunk08_Fig8_subset_vert_cladogram.pdf"
pdf(file=pdffn, width=9, height=9)

plot(vert_phylo, type="cladogram", direction="upwards", use.edge.length=FALSE)
title("Chunk 8, Fig. 8: Backbone cladogram of vertebrates")

# Or, even more simplified:
tips_to_drop = c("kiwi", "seagull", "turtle", "crocodile", "wall_lizard", "anole_lizard", "Tuatara")
vert_phylo_subset = drop.tip(vert_phylo, tip=tips_to_drop)
plot(vert_phylo_subset, type="cladogram", direction="upwards", use.edge.length=FALSE)
title("Chunk 8, Fig.8: Backbone cladogram of vertebrates\n(subset)")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #8
#######################################################

# 
# Here, because we read left-to-right, the display of
# the cladogram tends to accidentally reinforce 
# the ladder-like, "Great-Chain-of-Being" misunderstanding
# of evolution.
#
# To understand that this "linear view of evolution" is
# a misconception, one has to understand that:
#
# IN VIEWING PHYLOGENIES, NODE-ROTATION IS ARBITRARY
#
# ...meaning, that any of the branches may be rotated
# around the nodes, and EXACT SAME INFORMATION IS 
# ENCODED.  A researcher might chose to arrange a 
# tree a certain way in order to emphasise a 
# particular point, or a particular clade of 
# interest, but these are basically subjective decisions.
# 
#######################################################

# Demonstrating node rotations:

#######################################################
# START CODE CHUNK #9
#######################################################
# Open PDF
pdffn = "chunk09_Fig9_compare_2_vert_cladograms.pdf"
pdf(file=pdffn, width=6, height=9)

# Show 2 plots in one PDF
par(mfrow=c(2,1))
plot(vert_phylo_subset, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Chunk 9, Fig. 9a: Subset cladogram of vertebrates,\n(mammals at the right)")

# (rotating some nodes)
vert_phylo_subset2 = rotate(phy=vert_phylo_subset, node=c(12))
vert_phylo_subset2 = rotate(phy=vert_phylo_subset2, node=c(13))
vert_phylo_subset2 = rotate(phy=vert_phylo_subset2, node=c(14))
plot(vert_phylo_subset2, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Chunk 9, Fig. 9b: Subset cladogram of vertebrates,\n(amphibians at the right)")

# Stop writing PDF, open to screen
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #9
#######################################################

# Chunk 9, Q42: Does the mammal group "look" more primitive in 
#      Figure 9a versus Figure 9b? Why might someone think this? 
#
# Chunk 9, Q43: Is the mammal group ACTUALLY any more "primitive"
#      or "advanced" in Figure 9a versus Figure 9b of
#      the phylogeny? Why not?
#

# Arranging the full tree to put the most-recently diverging
# groups at the right:

#######################################################
# START CODE CHUNK #10
#######################################################
# Show two plots in one PDF
# Open PDF
pdffn = "chunk10_Fig10_compare_2_vert_cladograms.pdf"
pdf(file=pdffn, width=6, height=9)

# Show 2 plots
par(mfrow=c(2,1))
plot(vert_phylo, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Chunk 10, Fig. 10a: Backbone cladogram of vertebrates,\n(mammals at the right)")

vert_phylo2 = ladderize(vert_phylo, right=FALSE)
plot(vert_phylo2, type="phylogram", direction="upwards", use.edge.length=FALSE)
title("Chunk 10, Fig. 10b: Backbone cladogram of vertebrates,\n(most-recent divergences at the right)")

# Stop writing PDF, open to screen
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #10
#######################################################

# Chunk 10, Q44: What reptile species "appears" to be closest to 
#      humans in Figure 10a?
#
# Chunk 10, Q45: What reptile species is ACTUALLY closest to humans,
#      evolutionarily speaking, according to a proper
#      reading of these phylogenies? (Note: trick question!)
# 
# Chunk 10, Q46: Counting from the initial divergence of the tetrapods, 
#      approximately how many million years of evolution
#      has each of these groups experienced?
# 
#      humans
#      opossum
#      platypus
#      snake
#      Tuatara
#      crocodile
#      frog
# 
# Chunk 10, Q47: Which of these groups is "most evolved"?
#
#      (Note: there may be some trick questions in here!)
#

#######################################################
# PART V. MAPPING CHARACTER EVOLUTION ON A PHYLOGENY:
#         FEATHERED FLIGHT AND VIVIPARITY
#######################################################
# Here, we will practice mapping character evolution
# on a phylogeny. This requires defining some terms: 
#
# character: Any characteristic of a species that is 
# well-enough defined that it can be reliably 
# recognized in a species. Examples could includee: 
# feather colour, number of legs, presence of a tail,
# etc. These would be "morphological" characters -- 
# characters about the physical morphology 
# (morphology = "study of shape") of an organism.
#
# Many other kinds of characters exist, e.g.:
# - behavioural (parental care, mating behaviour)
# - cytological (cell structure)
# - karyotypic (chromosome number and structure)
# - molecular (DNA and amino acid sequences)
#
# A "character map" is a reconstruction of the 
# evolutionary history of the character, on a 
# phylogeny.
#
# (As a statistical phylogeneticist, I prefer to 
#  call these reconstructions "statistical estimates",
#  but "reconstruction" is an easier term to begin 
#  with.)
#
# There are a number of methods to reconstruct
# the history of a character on a phylogeny. A
# method that is easy to understand intuitively
# is "parsimony". A parsimonious reconstruction
# is a history that explains the evolutionary
# history of a character using the smallest
# number of character-change events. 
#
# Other ways to say this: "The most parsimonious
# character history is the one that minimizes
# the number of changes in the character."
# 
# "The most parsimonious character history is 
# the one that minimizes the number of character
# steps."
#
# If evolutionary changes in the character are
# rare in evolutionary time, which is often
# the case, then parsimony is a reasonable
# way to make an educated guess at the
# evolutionary history of a character.
#
# In future evolution classes, you will learn 
# more about how parsimony works as a computational 
# algorithm, as well as more sophisticated
# alternatives (Maximum Likelihood and 
# Bayesian statistical approaches).
#
# For now, we will just do some intuitive 
# character reconstructions, and compare 
# them to the computer's reconstruction.
#

# Load some character data for: feathered flight
#
# Note: NEXUS format is a common computer
# format for phylogenetic datasets

#######################################################
# START CODE CHUNK #11
#######################################################
library(phangorn)  # needed for NEXUS files

# We are putting a big text string inside '',
# then writing it to a temporary text file
# (so that we don't have to mess with people
#  downloading / misplacing files)
tetrapod_feathered_flight_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=19 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    shark           0
    tuna            0
    lungfish        0
    frog            0
    kiwi            0
    seagull         1
    Confuciusornis  1
    Archaeopteryx   1
    Velociraptor    0
    Brontosaurus    0
    crocodile       0
    turtle          0
    wall_lizard     0
    snake           0
    anole_lizard    0
    Tuatara         0
    platypus        0
    opossum         0
    human           0
    ;
END;
'

# Write to a temporary file
datafn = "chunk11_tetrapod_feathered_flight.nexus"
write(x=tetrapod_feathered_flight_nexus_format, file=datafn)

# Load the feathered flight data into a variable
# named "feathered flight"
feathered_flight_ndf = read.nexus.data(file=datafn)
state_names = c("no feathered flight", "feathered flight")
data_levels = c(0, 1)
feathered_flight = phyDat(feathered_flight_ndf, type = "USER", levels=data_levels)
feathered_flight

# Let's plot the character of "feathered flight" at the tips 
# of the tree
state_colors = c("brown", "blue")
nexus_to_tree_tiporder = match(x=names(unlist(feathered_flight_ndf)), table=vert_plusDinos_phylo$tip.label)
colors_to_plot = state_colors[1+as.numeric(unlist(feathered_flight_ndf))[nexus_to_tree_tiporder]]

pdffn = "chunk11_Fig11_distrib_feathered_flight.pdf"
pdf(file=pdffn, width=9, height=9)

plot(vert_plusDinos_phylo, label.offset=15)
axisPhylo()
mtext(side=1, text="Ma (millions of years ago)", line=3)

tiplabels(text=NULL, tip=1:length(vert_plusDinos_phylo$tip.label), col=colors_to_plot, bg=colors_to_plot, pch=21, cex=3.5)
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.75)
title("Chunk 11, Fig. 11: Distribution of the character of 'feathered flight'")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)

# Chunk 11, Q48: Looking at this phylogeny, and the distribution of the 
#      "feathered flight" character, 
#
# Chunk 11, Q48a: How many times do you think feathered flight evolved?  On a
#     paper copy of Figure 11, mark the branch(es) where you think 
#     feathered flight evolved.
#
# Chunk 11, Q48b: How many times do you think feathered flight was lost?  On a
#     paper copy of Figure 11, mark the branch(es) where you think 
#     feathered flight was lost.
#

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=vert_plusDinos_phylo, data=feathered_flight, method="fitch", site="pscore")
parsimony_score

#######################################################
# END CODE CHUNK #11
#######################################################

# Chunk 11, Q49: The "parsimony score" is the number of changes reconstructed
#      in the character. Does this match your guess based on
#      intuition, above?

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.
#######################################################
# START CODE CHUNK #12
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=vert_plusDinos_phylo, data=feathered_flight, type="MPR", cost=NULL, return="prob")

# Plot to a PDF
pdffn = "chunk12_Fig12_feathered_flight_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_plusDinos_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Chunk 12, Fig. 12: Mapping of ancestral states under Fitch parsimony\n(equal costs).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)

# Chunk 12, Q50: Does the parsimony reconstruction match your reconstruction
#      based on intuition?  Why or why not?

# Chunk 12, Q51: Approximately when did feathered flight evolve? (Give an 
#      approximate minimum and maximum date.)

# Taxon sampling (which species are included in the phylogeny)
# can strongly affect ancestral character reconstructions
#
# Let's do the same parsimony analysis, but leaving out 
# the fossil taxa.

#######################################################
# START CODE CHUNK #13
#######################################################

# MPR analysis
ancestral_states = ancestral.pars(tree=vert_phylo, data=feathered_flight, type="MPR", cost=NULL, return="prob")

parsimony_score = parsimony(tree=vert_phylo, data=feathered_flight, method="fitch", site="pscore")
parsimony_score

# Plot to a PDF
pdffn = "chunk13_Fig13_feathered_flight_reconstruction_without_fossils.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Chunk 13, Fig. 13: Mapping of ancestral states under Fitch parsimony\n(excluding fossils).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)

#######################################################
# END CODE CHUNK #13
#######################################################

# Chunk 13, Q52: Approximately when did feathered flight evolve, according 
#      to this reconstruction?

#######################################################
# Map the appearance of feathers
#######################################################
# Now, we will map the appearance of feathers
# on the phylogeny. To do this, we have
# to load the "feathers" character, by 
# loading the NEXUS text below.
#
# I have put "0" for all taxa, indicating
# "no feathers".
# 
# Chunk 14, Q53: For the fossil taxa that are coded as having
#      feathers, how do we know this? (use google)
#

#######################################################
# START CODE CHUNK #14
#######################################################
tetrapod_feathers_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=19 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    shark           0
    tuna            0
    lungfish        0
    frog            0
    kiwi            1
    seagull         1
    Confuciusornis  1
    Archaeopteryx   1
    Velociraptor    1
    Brontosaurus    0
    crocodile       0
    turtle          0
    wall_lizard     0
    snake           0
    anole_lizard    0
    Tuatara         0
    platypus        0
    opossum         0
    human           0
    ;
END;
'

# Write to a temporary file
datafn = "chunk14_tetrapod_feathers.nexus"
write(x=tetrapod_feathers_nexus_format, file=datafn)

# Load the feathers character into a variable
# named "feathers"
feathers_ndf = read.nexus.data(file=datafn)
state_names = c("no feathers", "feathers")
data_levels = c(0, 1)
feathers = phyDat(feathers_ndf, type = "USER", levels=data_levels)
feathers

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=vert_plusDinos_phylo, data=feathers, method="fitch", site="pscore")
parsimony_score
#######################################################
# END CODE CHUNK #14
#######################################################

# Chunk 14, Q54: Does the parsimony score match your guess based on
#      intuition, above? NOTE: Look in the R window at the result when 
#      you paste in "parsimony_score" (the end of code chunk 14). This 
#      number is the number of changes reconstructed on the tree.
#

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.

#######################################################
# START CODE CHUNK #15
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=vert_plusDinos_phylo, data=feathers, type="MPR", cost=NULL, return="prob")

# Plot to a PDF
pdffn = "chunk15_Fig15_feathers_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_plusDinos_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Chunk 15, Fig. 15: Mapping of feathers under Fitch parsimony\n(equal costs).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #15
#######################################################

# Chunk 15, Q55: Approximately when did feathers evolve? (Give an 
#      approximate minimum and maximum date.)
# 
# Chunk 15, Q56: Did feathers evolve for the function of flight?
#      How do you know? If they didn't, what might
#      have the original function of feathers been?
# 

#######################################################
# Another character: 
# Viviparity (live birth)
#  ...versus...
# Oviparity (lays eggs externally)
#######################################################
# The analysis we did with feathers and flight
# can be done with any well-defined character.
#
# One interesting character is "viviparity": the 
# birthing of live young. Animals that lay
# eggs external to the body exhibit "oviparity". 
#
# The popular imagination tends to think "only 
# mammals have live birth". 
#
# This is actually not true: various fishes,
# amphibians, and lizards & snakes have
# evolved viviparity (although, it is 
# true that oviparity is more common, and
# represents the ancestral condition).
#
# But, is it true that live birth is a
# defining characteristic of mammals?
#

#######################################################
# START CODE CHUNK #16
#######################################################
tetrapod_viviparity_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=19 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    shark           0
    tuna            0
    lungfish        0
    frog            0
    kiwi            0
    seagull         0
    Confuciusornis  0
    Archaeopteryx   0
    Velociraptor    0
    Brontosaurus    0
    crocodile       0
    turtle          0
    wall_lizard     0
    snake           0
    anole_lizard    0
    Tuatara         0
    platypus        0
    opossum         1
    human           1
    ;
END;
'

# Write to a temporary file
datafn = "chunk16_tetrapod_viviparity.nexus"
write(x=tetrapod_viviparity_nexus_format, file=datafn)

# Load the feathers character into a variable
# named "viviparity"
viviparity_ndf = read.nexus.data(file=datafn)
state_names = c("oviparity", "viviparity")
data_levels = c(0, 1)
viviparity = phyDat(viviparity_ndf, type = "USER", levels=data_levels)
viviparity

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=vert_plusDinos_phylo, data=viviparity, method="fitch", site="pscore")
parsimony_score
#######################################################
# END CODE CHUNK #16
#######################################################

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.
#######################################################
# START CODE CHUNK #17
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=vert_plusDinos_phylo, data=viviparity, type="MPR", cost=NULL, return="prob")

# Plot to a PDF
pdffn = "chunk17_Fig17_viviparity_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=vert_plusDinos_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Chunk 17, Fig. 17: Mapping of viviparity under Fitch parsimony\n(equal costs).")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #17
#######################################################

# Chunk 17, Q57: Approximately when did viviparity evolve in mammals?
#      (Give an approximate minimum and maximum date.)
# 
# Chunk 17, Q58: Did viviparity evolve at the same time as crown mammals?
#      How do you know? 
# 
# Chunk 17, Q59: Five living species of mammals lay eggs. Look them up,
#      and list them here. For each, give the common name,
#      followed by the scientific name (the Latin binomial)
#      WITH CORRECT SPELLING AND FORMAT FOR THE SCIENTIFIC NAMES.
#
#      (a simple guide to scientific names is here:
#      http://entomology.ifas.ufl.edu/frank/kiss/kiss6.htm )
#
#      ...you should follow the formatting rules for scientific
#      names in all future academic work (assignments, exam 
#      essays, etc.)
#

Bonus material: The phylogeny and biogeography of kiwis, moas, and relatives

#######################################################
# PART VI. (BONUS MATERIAL) THE PHYLOGENY AND 
#  BIOGEOGRAPHY OF KIWIS, MOAS, AND RELATIVES
#######################################################
#
# We know that New Zealand and other continental
# blocks used to be connected in the ancient 
# supercontinent of Gondwanaland.  The major
# continental blocks split at approximately
# these times:
#
# Gondwanaland breakup times (all approximate)
# Sources: Gibbs 2016, Yonezawa et al. 2017
# 
# 130 mya - Africa splits from South America
#           (beginning of Atlantic Ocean)
# 130 mya - India + Madagascar break off
#           from western Australia
# 80 mya  - Zealandia splits off from eastern Australia;
#           South America-Antarctica-Australia 
#           remain connected
# ~35 mya - Australia separates from Antarctica
# ~30 mya  - Drake passage opens, separating Antarctica & South America
# 
#
# Vicariance
#
# Speciation caused by geographic separation 
# is known as "allopatric" speciation, and
# allopatric speciation caused by the 
# formation of a barrier due to external 
# factors (e.g., continental breakup)
# is known as "vicariance".
# 
# For many years, the geographic distribution 
# of the flightless ratites has been claimed 
# as a likely example of vicariance caused
# by continental break-up.
#
# Paleognaths and Neognaths
# 
# Living birds are divided into two clades: paleognaths
# and neognaths. The paleognaths include many famous
# flightless birds, including the kiwi, the extinct 
# moas, the Australian emu and cassowary, the 
# South American rhea, the ostrich, and the 
# extinct elephant birds of Madagascar.  The 
# tinamous of South America, which can fly, are
# also in the paleognath clade.
#
# The neognath clade contains all other living birds.
#
# For many years, the flightless paleognaths were
# thought to form a clade, named the ratites, with
# the flying tinamous as an outgroup.
#   
# For example, a 1997 source (Dingus & Rowe 1997) gives
# this cladogram of the ratites, based on an 
# analysis of morphological characters:

#######################################################
# START CODE CHUNK #18
#######################################################
pdffn = "chunk18_Fig18_ratite_clado_1997.pdf"
pdf(file=pdffn, width=6, height=6)

ratite_newick_str = "(outgroup,(Tinamidae,(((Rhea,Struthio),(Dromaeus,Casuarius)),(Dinornis,Apteryx))));"
ratite_phylo = read.tree(file="", text=ratite_newick_str)
plot(ratite_phylo, use.edge.length=FALSE)
title("Chunk 18, Figure 18: Cladogram of ratite birds\n(after Dingus & Rowe 1997)")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #18
#######################################################

# Chunk 18, Q60: On a paper copy of Figure 18, look up the common names of 
#      each tip, and the continent or island of origin, and write them
#      next to the scientific names.
#
# Chunk 18, Q61: In the 1997 cladogram, what is the closest relative
#      of the kiwi clade?
#
# Chunk 18, Q62: Based on this cladogram, how many times did
#      flightlessness evolve?  Mark the change(s)
#      on a paper version of Figure 18.
#
# Map the flightlessness character on the cladogram
# to check your work:

#######################################################
# START CODE CHUNK #19
#######################################################
# Load the flightlessness data
ratite_flight_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=8 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1";
    MATRIX
    outgroup     0
    Tinamidae    0
    Dinornis     1
    Apteryx      1
    Casuarius    1
    Dromaeus     1
    Struthio     1
    Rhea         1
    ;
END;
'

# Write to a temporary file
datafn = "chunk19_ratite_flight.nexus"
write(x=ratite_flight_nexus_format, file=datafn)

# Load the flight data
ratite_flight_ndf = read.nexus.data(file=datafn)
state_names = c("no flight", "flight")
data_levels = c(0, 1)
ratite_flight = phyDat(ratite_flight_ndf, type = "USER", levels=data_levels)
ratite_flight

# Let's run a parsimony analyses on this character
parsimony_score = parsimony(tree=ratite_phylo, data=ratite_flight, method="fitch", site="pscore")
parsimony_score
#######################################################
# END CODE CHUNK #19
#######################################################

#######################################################
# START CODE CHUNK #20
#######################################################
# MPR (Most Parsimonious Reconstruction) of this character
ancestral_states = ancestral.pars(tree=ratite_phylo, data=ratite_flight, type="MPR", cost=NULL, return="prob")
names(ancestral_states)
summary(ancestral_states)

# Plot to a PDF
pdffn = "chunk20_Fig20_ratite_flight_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=7, height=7)

plotAnc(tree=ratite_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
titletxt = paste0("Chunk 20, Fig. 20: Mapping of ratite flight under Fitch parsimony\n(equal costs).")
title(titletxt)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.5)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #20
#######################################################

#
# A 2017 paper by Yonezawa et al. conducted a 
# comprehensive re-analysis of the phylogeny
# of paleognaths, combining morphological 
# data with DNA from living species and 
# ancient DNA from extinct subfossils of
# moas and elephant birds. This phylogeny
# (following on several other studies)
# suggested a different history of flight
# in the paleognath clade, which in turn
# impacts our ideas about its biogeographical
# history.
#
#

#######################################################
# Plot Yonezawa et al. 2017 paleognath phylogeny
#######################################################
#######################################################
# START CODE CHUNK #21
#######################################################
pdffn = "chunk21_Fig21_paleognath_phylo_Yonezawa_etal_2017.pdf"
pdf(file=pdffn, width=9, height=9)

paleognaths_newick_str = "(outgroup:5,((Pseudocrypturus:52.16046818,(Paracathartes_howardae:38.45691818,Lithornis:31.21926818):10.4496):6.0879,((Struthio_camelus:83.16381818,Palaeotis:33.80436818):0.51765,(((((Aepynornis:37.21791818,Mullerornis:37.21791818):28.43295,((Apteryx_australis:6.150518182,(Apteryx_australis_mantelli:4.411718182,Apteryx_rowi:4.411718182):1.7388):7.0245,(Apteryx_owenii:4.503068182,Apteryx_haastii:4.503068182):8.67195):52.47585):4.62945,((Emuarius:4.697318182,Dromaius_novaehollandiae:29.26311818):4.46775,(Casuarius_casuarius:6.644018182,Casuarius_bennetti:6.644018182):27.08685):36.54945):2.90115,((Antarctica_fossil:10.22871818,((Eudromia_elegans:29.69886818,Nothoprocta_perdicaria:29.69886818):1.7976,(Crypturellus:23.76426818,Tinamus_major:23.76426818):7.7322):8.5239):16.1595,((Dinornis_giganteus:10.52796818,Megalapteryx_didinus:10.52796818):2.4024,(Anomalopteryx_didiformis:7.508168182,(Emeus_crassus:7.170068182,Pachyornis_australis:7.170068182):0.3381):5.4222):43.2495):17.0016):1.69785,(Diogenornis:9.764618182,(Rhea_americana:10.24236818,Pterocnemia_pennata:10.24236818):58.36635):6.2706):8.80215):26.2857):5);"
paleognaths_phylo = read.tree(file="", text=paleognaths_newick_str)
plot(paleognaths_phylo)
axisPhylo()
title("Chunk 21, Fig. 21: Phylogeny of paleognath birds\n(from Yonezawa et al. 2017)")
mtext(side=1, text="Ma (millions of years ago)", line=3)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #21
#######################################################

# Chunk 21, Q63: On a paper copy of Figure 21, write the continental/island 
#      locations, and common names (or short descriptions) of the taxa
#      listed at the tips of the phylogeny.
#
# Chunk 21, Q64: In the 2017 phylogeny, what is the closest relative
#      of the kiwi clade?
#
# Chunk 21, Q65: Compare the 1997 and 2017 phylogenies. Flightless ratites are a
#      clade in the 1997 tree.  Are they a clade in the 2017 tree?
#      If not, are ratites paraphyletic or polyphyeletic?

#######################################################
# START CODE CHUNK #22
#######################################################
paleognath_flight_nexus_format = '#NEXUS

BEGIN DATA;
    DIMENSIONS  NTAX=30 NCHAR=1;
    FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "  0 1";
    MATRIX
    outgroup                    1
    Pseudocrypturus             1
    Paracathartes_howardae      1
    Lithornis                   1
    Struthio_camelus            0
    Palaeotis                   0
    Aepynornis                  0
    Mullerornis                 0
    Apteryx_australis           0
    Apteryx_australis_mantelli  0
    Apteryx_rowi                0
    Apteryx_owenii              0
    Apteryx_haastii             0
    Emuarius                    0
    Dromaius_novaehollandiae    0
    Casuarius_casuarius         0
    Casuarius_bennetti          0
    Antarctica_fossil           0
    Eudromia_elegans            1
    Nothoprocta_perdicaria      1
    Crypturellus                1
    Tinamus_major               1
    Dinornis_giganteus          0
    Megalapteryx_didinus        0
    Anomalopteryx_didiformis    0
    Emeus_crassus               0
    Pachyornis_australis        0
    Diogenornis                 0
    Rhea_americana              0
    Pterocnemia_pennata         0

;
END;
'

# Write to a temporary file
datafn = "chunk22_paleognath_flight.nexus"
write(x=paleognath_flight_nexus_format, file=datafn)

# Load the flightlessness vs. flight data into a variable
# named "paleognath_flight"
paleognath_flight_ndf = read.nexus.data(file=datafn)
state_names = c("no flight", "flight")
data_levels = c(0, 1)
paleognath_flight = phyDat(paleognath_flight_ndf, type = "USER", levels=data_levels)
paleognath_flight

# Let's plot the character of flightlessness vs. flight at the tips 
# of the tree
pdffn = "chunk22_Fig22_paleognath_phylo_with_flight_plotted.pdf"
pdf(file=pdffn, width=9, height=9)

state_colors = c("brown", "blue")
nexus_to_tree_tiporder = match(x=names(unlist(paleognath_flight_ndf)), table=paleognaths_phylo$tip.label)
colors_to_plot = state_colors[1+as.numeric(unlist(paleognath_flight_ndf))[nexus_to_tree_tiporder]]
plot(paleognaths_phylo, label.offset=3)
axisPhylo()
mtext(side=1, text="Ma (millions of years ago)", line=3)

tiplabels(text=NULL, tip=1:length(paleognaths_phylo$tip.label), col=colors_to_plot, bg=colors_to_plot, pch=21, cex=2)
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.75)
title("Chunk 22, Fig. 22: Distribution of the character of 'flight' in paleognaths")

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #22
#######################################################

# Chunk 22, Q66: Intuitively, what does the Yonezawa et al. phylogeny
#      suggest about the evolution of flightlessness in 
#      the paleognaths?

# See if your intuition agrees with the parsimony reconstruction:

# Parsimony score on the flight/flightlessness character
parsimony_score = parsimony(tree=paleognaths_phylo, data=paleognath_flight, method="fitch", site="pscore")
parsimony_score

# Chunk 22, Q67: Does the parsimony score match your guess based on
#      intuition, above?

# Let's do an MPR (Most Parsimonious Reconstruction) of this character
# on the tree, then plot the map of the character on the tree.

#######################################################
# START CODE CHUNK #23
#######################################################
# MPR analysis
ancestral_states = ancestral.pars(tree=paleognaths_phylo, data=paleognath_flight, type="MPR", cost=NULL, return="prob")
names(ancestral_states)
summary(ancestral_states)

# Plot to a PDF
pdffn = "chunk23_Fig23_paleognath_flight_reconstruction_under_Fitch_parsimony.pdf"
pdf(file=pdffn, width=11, height=11)

plotAnc(tree=paleognaths_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
axisPhylo()

titletxt = paste0("Chunk 23, Fig. 23: Mapping of paleognath flight under Fitch parsimony\n(equal costs)")
title(titletxt)
mtext(side=1, text="Ma (millions of years ago)", line=3)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors, cex=0.75)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28.5, labels=parsimony_txt, pos=4, cex=0.75)

dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #23
#######################################################

# 
# Chunk 23, Q68: Under this reconstruction, how many times was 
#      flight lost?
#
# Chunk 23, Q69: Under this reconstruction, how many times was
#      flight gained?
#
# Chunk 23, Q70: Does this parsimony reconstruction of the evolution of 
#      flight/flightlessness agree with your intuition?
# 
# Chunk 23, Q71: Do you think it makes sense to think that it is 
#      equally easy for a lineage to lose flight as to 
#      gain flight?
#
#
# Let's run another parsimony analysis, but one where
# regaining flight "costs" 10 times as much as losing
# flight.  (Here the parsimony "cost" is the number of
# steps counted per change.)

########################################################
# What if loss-of-flight is irreversible?
########################################################

#######################################################
# START CODE CHUNK #24
#######################################################
cost_of_losing_flight = 1
cost_of_regaining_flight = 10

# Set up a 2x2 cost matrix
cost_matrix = matrix(data=c(0,cost_of_losing_flight,cost_of_regaining_flight,0), ncol=2, byrow=TRUE)
colnames(cost_matrix) = c("flight","flightless")
rownames(cost_matrix) = c("flight","flightless")

# Look at the cost matrix
cost_matrix

# Now let's calculate a parsimony score, using "Sankoff" parsimony
# (Before this, we have been using "Fitch" parsimony, which assumes
#  equal costs for all changes. Under Sankoff parsimony, the user
#  assigns a cost matrix.)
parsimony_score = parsimony(tree=paleognaths_phylo, data=paleognath_flight, method="sankoff", site="pscore", cost=cost_matrix)
parsimony_score
#######################################################
# END CODE CHUNK #24
#######################################################

# Chunk 24, Q72: What is the parsimony score under this cost matrix?

#######################################################
# START CODE CHUNK #25
#######################################################
# Reconstruct the ancestral states under this cost matrix
ancestral_states = ancestral.pars(tree=paleognaths_phylo, data=paleognath_flight, type="MPR", cost=cost_matrix, return="prob")

pdffn = "chunk25_Fig25_paleognath_ancestral_flight_under_Sankoff_parsimony.pdf"
pdf(file=pdffn, width=11, height=11)

plotAnc(tree=paleognaths_phylo, data=ancestral_states, i=1, site.pattern=TRUE, col=state_colors, pos=NULL)
titletxt = paste0("Chunk 25, Fig. 25: Mapping of ancestral states under\nSankoff parsimony (unequal costs). Cost of regaining flight=", cost_of_regaining_flight)
title(titletxt)

axisPhylo()
mtext(text="Millions of years ago (mega-annum, Ma)", side=1, line=2)

# Plot the legend
legend(x="topleft", legend=state_names, fill=state_colors)

parsimony_txt = paste0("parsimony score\n(# of steps) = ", parsimony_score) 
text(x=-8, y=28, labels=parsimony_txt, pos=4)
dev.off()
cmdstr = paste0("open ", pdffn)
system(cmdstr)
#######################################################
# END CODE CHUNK #25
#######################################################

# Chunk 25, Q73: Under Sankoff parsimony with this cost matrix, how many times
#      was flight gained?
#
# Chunk 25, Q74: How many times was flight lost?
#
# Chunk 25, Q75: What does this reconstruction suggest about the vicariance
#      explanation for the distribution of the ratites?  (You
#      don't have to have a "final" answer, these questions are
#      still being actively researched.)
#

References

###################################################################
# REFERENCES
#
# Chiari, Ylenia; Cahais, Vincent; Galtier, Nicolas; 
# Delsuc, Frédéric (2012). "Phylogenomic analyses support the 
# position of turtles as the sister group of birds and 
# crocodiles (Archosauria)." BMC Biology, 10:65.
# https://doi.org/10.1186/1741-7007-10-65
# 
# Dingus, Lowell; Rowe, Timothy (1998). The Mistaken 
# Extinction & CD-Rom: Dinosaur Evolution and the Origin 
# of Birds (Academic Version), 1st Edition. W. H. Freeman, 
# pp. 1-332 + CD-ROM.  (ratite cladogram on CD-ROM)
#
# Gibbs, George C. (2016). Ghosts of Gondwana: The History of Life 
# in New Zealand. Fully revised edition. Potton & Burton, pp. 1-367.
#
# Ihaka, Ross; Gentleman, Robert (1996). "R: A Language 
# for Data Analysis and Graphics." Journal of 
# Computational and Graphical Statistics. Volume 5,
# Issue 3, pages 299-314.
# https://amstat.tandfonline.com/doi/abs/10.1080/10618600.1996.10474713
# 
# Yonezawa, Takahiro, et al. (2017). Phylogenomics and Morphology
# of Extinct Paleognaths Reveal the Origin and Evolution of the 
# Ratites. Current Biology, 27, 68–77. 
# https://www.cell.com/current-biology/fulltext/S0960-9822(16)31214-3
###################################################################

PDF file with all Figures

For the purposes of study or reference (or if your computer / R program don't function), I have uploaded all of the graphics produced by this code in "BIOSCI109_Lab4_ALL_FIGURES.pdf".

It can be accessed by scrolling to the bottom of this page and clicking "Files".

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License