Explore Genome Size Data

This R script is part of a lab exercise for BioSci210: Evolution, at the University of Auckland.

It is designed to be run simply by copying-and-pasting from the script, into R. The necessary files are hosted online, so you will have to be connected to the internet for this to work.

Copy and paste the script in chunks, reading the comments as you go, and looking (and taking the time to understand) each plot that that is being produced. Be careful not to skip sections, this may cause errors, forcing you to go back and re-do the pasting again.

Note for running offline: If you want to run the script offline, you can save each file to a working directory on your local hard drive, and then change the R script to look at this working directory (wd="working directory") using the setwd() and getwd() functions, and editing the trfn and datafn lines to refer to the local copies of the files. You can also download the R functions from gist, and source() the local versions of those files.)

Main Goals:

The main goals of this exercise include:

  • Observe the diversity of genome sizes
  • Understand scatter plots and regressions
  • Understand log-scales for the x- and y-axis of scatter plots
  • Understand where human genome size fits in the diversity of genome size
  • Understand the difference between simple linear regression, and Phylogenetic Independent Contrasts
  • Understand that correlations that may appear significant when phylogeny is ignored, may disappear once phylogeny is taken into account
  • Observe some phylogenetically-structured patterns in genome size, and speculate about possible causes
  • Ask yourself if the genome size of humans appears "special" or "optimal" in any obvious way

Side Goals

  • This lab is not intended to train you in R, but it will give you some exposure to it. If you are interested in starting to learning R (which is a great job skill for biologists and any other scientists), I would recommend (1) working through this introductory tutorial: http://phylo.wikidot.com/2014-summer-research-experiences-sre-at-nimbios-for-undergra ; (2) working through the genome-size script slowly, trying to understand the major actions being done; (3) take a workshop or a biostatistics/bioinformatics course with a heavy R focus.
  • Similarly, this lab is not intended to train you in statistics, but it will give you some additional exposure.

Genome size analysis script (should run "out of the box")

insert the code here
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License