Per Event Weights Of Cladogenetic Range Inheritance Events I

Introduction

(link to this section)

This page addresses, in as introductory a way as possible, the important question about what the parameter "j" means in DEC+J and in the BioGeoBEARS supermodel (these are not necessarily identical questions). The answer gets somewhat complicated, so I recommend people put in some serious study, read this a few times, and look at the Excel file I link to, in order to "get it".

Once you "get it", it's actually not that hard to conceptualize. However, I think it was historically fairly common for researchers to not "get" the weights/sum of weights math that went into the DEC likelihood calculation at cladogenesis events in the first place, so first you have to make sure you get DEC first (described below), in order to "get" DEC+J.

Short answer

(link to this section)

On Tue, Jul 22, 2014 at 3:07 AM, a member of the BioGeoBEARS google group wrote:

Hi Nick,

Congrats for your papers being accepted!

I was wandering, if I infer a per event weight of 0.03 for j, 
does that mean that every 100 cladogenesis event, 3 are 
founder speciation?

The short answer is: no.

Medium answer

(link to this section)

j is a "per-event weight", not a straight probability. The probability of any particular cladogenetic range inheritance event is a CONDITIONAL probability, calculated as follows:

Prob(event, CONDITIONAL on a particular ancestor range) = 

= P(event | range)
= (weight of event | range) / sum(all weights of all possible events | range)

I.e., probability of an individual event is the per-event weight divided by the total of all the per-event weights for all events possible given a particular ancestor range just before cladogenesis.

Long answer

(link to this section)

The reason for the weight/sum of weights calculation goes back to the DEC model of Ree et al. (2005) and Ree & Smith (2008), as coded in LAGRANGE.

To model the "C" (cladogenesis) part of DEC, they noted that there were a number of possible events:

  1. single-area sympatry/range-copying, e.g. A->A,A
  2. subset sympatry, e.g. AB -> AB, A
  3. vicariance, e.g. AB -> B, A

They decided to give each possible event, conditional on a particular ancestor, equal weight. This translates to probabilities as follows:

(for a 3-area system…it will be different for 2, 3, 4, 5 areas, etc.)

DEC model weights to probabilities, 3-area system

(link to this section)

Assuming:
DEC model, 3 area system
Weights: y=1, s=1, v=1 (and j=0)

Conditional on ancestor: A, 3 areas total

======================================
Total possible events: 1
Total weight of events: 1
Probability of EACH CLASS of events, under DEC
Prob(sympatry) = P(event of class y) = 1 / 1 = 1

Probability of EACH INDIVIDUAL EVENT, under DEC
P(particular y event) = y/1 = 1 
======================================

Conditional on ancestor: AB, 3 areas total

======================================
Total possible events: 6 (0 sympatry, 2 vicariance, 4 subset)
Total weight of events: 6

Probability of EACH CLASS of events, under DEC
Prob(sympatry) = P(event of class y) = 0*y / 6.0 = 0
Prob(vicariance) = P(event of class v) = 2*v / 6.0 = 1/3
Prob(subset-sympatry) = P(event of class s) = 4*s / 6.0 = 2/3

Probability of EACH INDIVIDUAL EVENT, under DEC
P(particular y event) = y/6 = 0 
P(particular v event) = v/6 = 1/6
P(particular s event) = s/6 = 1/6 
======================================

Conditional on ancestor: ABC, 3 areas total

======================================
Total possible events: 12 (0 sympatry, 6 vicariance, 6 subset)
Total weight of events: 12

Probability of EACH CLASS of events, under DEC
Prob(sympatry) = P(event of class y) = 0*y / 12 = 0
Prob(vicariance) = P(event of class v) = 6*v / 12 = 0.5
Prob(subset-sympatry) = P(event of class s) = 6*s / 12 = 0.5

Probability of EACH INDIVIDUAL EVENT, under DEC
P(particular y event) = y/12 = 0 
P(particular v event) = v/12 = 1/12 
P(particular s event) = s/12 = 1/12 
======================================

Note the difference between the probabilities of each CLASS of events, and each individual event. The numbers of different types of cladogenesis events change depending on whether an ancestor is A, AB, ABC, etc.

To address what you maybe be thinking now: Yes, this weights vs probabilities issue is at the heart of why DEC is probably not understood very well even by many people who have used it.

(Note: There is not an obvious way to keep the probability of each CLASS of events the same between different-sized ancestors. If it were done (at least for ranges of size 2 through size numareas-1), it would be a different model from DEC, and, I suspect, not a better one. After all, it could be the case that, in real life, classes of biogeographical events that have more possible events, actually do have more probability as a class. Compared to other variations on possible models, my intuition is that this imaginable model variant is less likely to be important than other modifications, but people are free to construct such models and test them if they like, of course.)

Now, for completeness, I will repeat these calculations for the DEC+J model, where y=s=v=(3-j)/3.

DEC+J model weights to probabilities, 3-area system

(link to this section)

DEC+J model, 3 area system
Weights:
j=0.03
y=s=v=(3-j)/3 = (3-0.03)/3 = 1-0.01 = 0.99
So:
y=0.99, s=0.99, v=0.99, j=0.03

Conditional on ancestor: A, 3 areas total

======================================
Total possible events: 5 (1 sympatry, 4 founder)
Total weight of events: 1*y + 4*j = 0.99 + 0.12 = 1.11

Probability of EACH CLASS of events, under DEC+J
Prob(sympatry) = P(event of class y) = 0.99 / 1.11 = 0.8919
Prob(founder) = P(event of class j) = 0.04 / 1.11 = 0.1081

Probability of EACH INDIVIDUAL EVENT, under DEC+J
P(particular y event) = y/1.11 = 0.99 / 1.11 = 0.8919
P(particular j event) = j/11.11 = 0.03 / 1.11 = 0.0270
======================================

Conditional on ancestor: AB, 3 areas total

======================================
Total possible events: 8 (0 sympatry, 2 vicariance, 4 subset, 2 founder)
Total weight of events: 0*y + 2*v + 4*s + 2*j = 6.0

Probability of EACH CLASS of events, under DEC+J
Prob(sympatry) = P(event of class y) = 0*y / 6.0 = 0
Prob(vicariance) = P(event of class v) = 2*v / 6.0 = 0.3300
Prob(subset-sympatry) = P(event of class s) = 4*s / 6.0 = 0.6600
Prob(founder) = P(event of class j) = 2*j / 6.0 = 0.01

Probability of EACH INDIVIDUAL EVENT, under DEC+J
P(particular y event) = 0 
P(particular v event) = v/6 = 0.99/6 = 0.165
P(particular s event) = s/6 = 0.99/6 = 0.165
P(particular j event) = j/6 = 0.03/6 = 0.005
======================================

Conditional on ancestor: ABC, 3 areas total

======================================
Total possible events: 12 (0 sympatry, 6 vicariance, 6 subset, 0 founder)
Total weight of events: 0*y + 6*v + 6*s + 0*j = 11.88

Probability of EACH CLASS of events, under DEC+J
Prob(sympatry) = P(event of class y) = 0*y / 11.88 = 0
Prob(vicariance) = P(event of class v) = 6*v / 11.88 = 0.5
Prob(subset-sympatry) = P(event of class s) = 6*s / 11.88 = 0.5
Prob(founder) = P(event of class s) = 0*j / 11.88 = 0

Probability of EACH INDIVIDUAL EVENT, under DEC+J
P(particular y event) = 0 
P(particular v event) = v/11.88 = 1/12 
P(particular s event) = s/11.88 = 1/12 
P(particular j event) = 0
======================================

Further Information and Excel File

(link to this section)

I discuss the weights vs. probabilities issue in more detail in my Systematic Biology manuscript, particularly in Table 1 and Figure 1. I have put the ms, post-minor-revisions, online here:

https://b0616d00ff35bf3cda9aa2c51a671878457b2402.googledrive.com/host/0B6HJElw5txrwZlhjQ0ZUcEdzUG8/2013-08-10_matzke_PhD.html#accepted

The Supplemental Data has an Excel file which does the weight calculations for DEC and DEC+J for 2-, 3-, and 4-area systems (after that the cladogenesis matrix gets too huge).

I have put the Excel file here. Start with the "custom_wts_DECJ_1area" worksheet:

https://b0616d00ff35bf3cda9aa2c51a671878457b2402.googledrive.com/host/0B6HJElw5txrwZlhjQ0ZUcEdzUG8/ch2_SuppData_01_cladogenesis_parameterizations_v04.xlsx

The right way to count events — Biogeographical Stochastic Mapping (BSM)

(link to this section)

See also: Stochastic mapping under biogeographical models (in progress).

Now, BioGeoBEARS has functions that do all of the above calculations quickly. (This becomes extremely non-trivial for large state spaces — e.g., 9 areas = 2^9 ranges/states = 512 states = 511 non-null ranges = 511x511x511 combinations of (ancestor range, left descendant range, right descendant range) = 133,432,831 combinations which have to be assigned conditional probabilities from the weights. The R package cladoRcpp is what iterates through these combinations quickly.)

The weights strategy makes this fast and tractable, but it does cause a problem for the interpretation of j (and y, s, and v). Higher j means more founder events, but the probability of founder events as a class will be different conditional on each possible ancestor range size (and the specific range, if there are any other modifiers due to constraints).

Furthermore, even if you calculate the per-event probabilities and the per-event-class probabilities under some specific combination of ranges and parameters, this just gives you the probabilities per cladogenesis event, conditional on each possible ancestor range.

It still doesn't get you a real estimate of the overall probability of founder events on your phylogeny. This is because that probability will depend on the ancestral state estimates and their probabilities. If you estimate that the ancestor nodes had widespread ranges, this will mean there were fewer opportunities for founder-events, compared to other sorts of events, than if the ancestor nodes had narrow ranges.

Your only real options to get the overall probability of founder-events, conditional on the data, tree, and model, are:

  1. Stare at your ancestral range estimation graphic, and attempt to count obvious founder-events by hand. In other words, guestimate.
  2. Do Biogeographical Stochastic Mapping (BSM) conditional on the tree, data, and model, repeat 100 times, and get the mean and variance of the counts of each event type.

As it happens, I've basically got #2 working, although testing/tutorials etc. will take awhile. Email me if you are someone who is interested in collaborating on doing this.

In my stochastic mapping test run on the Psychotria data (19 taxa, 4 areas, M0 unconstrained model), I get:

DEC model ML parameters: d=0.034, e=0.028, j=0

==========================================
Biogeographical stochastic mapping event counts, 100 realizations on Psychotria (19 taxa, 4 areas, M0 unconstrained)

Mean of event counts:
       d        e        a        y        s        v        j 
5.527778 1.629630 0.000000 9.842593 3.444444 4.712963 0.000000 

Standard deviation of event counts:
        d         e         a         y         s         v         j 
0.8367531 1.0987894 0.0000000 1.5234125 1.4997404 0.8652257 0.0000000 
==========================================

DEC+J model ML parameters: d=0.0, e=0.0, j=0.11

==========================================
Biogeographical stochastic mapping event counts, 100 realizations on Psychotria (19 taxa, 4 areas, M0 unconstrained)

Mean of event counts:
        d         e         a         y         s         v         j 
0.0000000 0.0000000 0.0000000 9.9629630 0.4537037 0.4259259 7.1574074 

Standard deviation of event counts:
        d         e         a         y         s         v         j 
0.0000000 0.0000000 0.0000000 1.1991112 0.8579944 0.4967879 0.8877361 
==========================================

So, you can see that, for the Hawaiian Psychotria M0 DEC+J model, when the j parameter ~0.11, about 40% of the cladogenesis events are founder-events, about 55% are within-island sympatry, and only about 5% are vicariance or subset sympatry, even though y=s=v=(3-j)/3 = 0.96.

The Psychotria tree has only 19 tips and thus 18 internal nodes, so 5% vicariance/subset means that, on average across the stochastic maps, perhaps only one of the nodes has an event that is vicariance or subset.

Note that all of these statements are UNDER THE SPECIFIED MODEL, PARAMETERS, DATA, AND TREE (which we assume, incorrectly, to be the true tree without extinction, which may or may not be important, depending). Change any of these, and you may get different answers about what the estimated weight parameters imply.

The nice thing about stochastic mapping is that the counts of events are comparable, or at least one can compare e.g. events per million years of branchlength, events per (observed) cladogenesis event, etc.

Histograms of event counts, Psychotria M0 model, DEC vs. DEC+J

(link to this section)

Cladogenetic event counts for 100 stochastic maps on DEC model ML parameters: d=0.034, e=0.028, j=0:

stochastic_map_counts_Psychotria_M0_DEC.png

Cladogenetic event counts for 100 stochastic maps on DEC+J model ML parameters: d=0.0, e=0.0, j=0.11:

stochastic_map_counts_Psychotria_M0_DEC%2BJ.png

Other models and what j etc. would mean in them

(link to this section)

Because of this weighting strategy in DEC-type models, it is VERY IMPORTANT to think carefully about what you are doing with y, s, v, and j in other models. In my example models DIVALIKE+J and BAYAREALIKE+J, you can see via…

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

…that I have set:

DIVALIKE+J:
y = (2-j)/2
v = (2-j)/2
s = 0
j = j, 0<j<2
BAYAREALIKE+J:
y = (1-j)/1
v = 0
s = 0
j = j, 0<j<1

This was done to imitate the DEC idea, that (assuming j=0), each event that is possible gets a weight of 1. Note that this means that j in DEC+J is not strictly comparable to j in DIVALIKE+J or BAYAREALIKE+J. There actually is not a way to make them strictly equivalent, because the weights are all relative to the other weights, and e.g. DIVALIKE has different numbers of possible vicariance events than DEC, has 0 subset sympatry events, etc.

Note: As I mention in the Systematic Biology manuscript, because everything is divided by the sum of the per-event weights, a number of parameterizations could be imagined which would all be equivalent. If you modify the default models, you have to ensure identifiability. If you set y, s, v, and j to all be y=s=v=j=1, then, because of the division by the sum of the weights, this would be the same model as y=s=v=j=0.1. Thus if all 4 parameters are free and not forced to sum to something, there are an infinite number of parameter combinations that would result in the same likelihood, and your ML search will never converge.

Note also: The correct way to do the hypothetical 4-free-parameter cladogenesis model would be with a simplex (and actually this would just be 3 free parameters, since specifying the three would automatically specify the fourth), but it would take some somewhat serious experimentation: first, to ensure that your specification of the BioGeoBEARS model is doing a simplex (it may take some new code, I haven't thought about it very much); second, to ensure that your ML optimization routine is searching the simplex parameter space properly (no guarantee with optim/optimx); and third, that your data actually supports that many free parameters.

Final note: The whole BioGeoBEARS strategy for doing cladogenesis models was just an attempt to extend and test DEC, and it is just a first attempt. There could obviously be better ways, although they might be more complex and require new programming. There is no particular reason, for instance, that we have to link weight parameters across different ancestral range sizes. This linkage causes some interesting effects, e.g. in DEC+J if you have a lot of single-area sympatry (e.g. within-island speciation), this will force y to be high, which also forces s and v to be high, even if they are completely unnecessary.

Acknowledgements

(link to this section)

Thanks to Julien for asking the question and thus inspiring the BioGeoBEARS google group answer, and to Fredrik Ronquist who suggested the simplex during reviews of my ms.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License