I am in Austin this weekend for the Phylogenomics and metagenomics symposium and workshop organized by Tandy Warnow. Today was the symposium tomorrow is the workshop. Lots of really interesting and exciting methods were presented today. Prior to getting here I felt that the thing that I was most interested in was simultaneous estimation of trees and alignments – SATe type tools - these were really cool and interesting. Although two speakers really grabbed my attention with ideas involving “importance sampling”.
Fair warning: Anything that doesn’t make sense below is certainly due to my own misunderstanding.
The first to do this was Mark Holder. He was discussing stepping stone sampling to estimate the marginal likelihood of a model. Part of this process involved getting trees that are similar to those in the posterior but were not actually sampled. The challenge here is to do this in an informed way were you use the info in the posterior to inform your choice of trees. So the way this works is that you can look at a consensus tree and you will have probabilities of each edge in your consensus. To produce new trees just work your way through your consensus tree retaining edges in proportion to their posterior probability. This will leave you with a bunch of multifurcations that need to be resolved in any way other than the way they are found in the consensus tree. This process allows you to produce a new sample of trees which will be centered on the posterior distribution but spread out.
The second was Bret Larget. So Bret’s idea is that because subtrees are “approximately” independent of one another you can use a sample of trees to estimate the probability of trees that were never sampled in the tree search process. Furthermore this process can really be used to estimate a true probability of a tree rather than simply considering its probability to be equal to its frequency in the posterior sample produced by the mcmc sampler. I cant explain the way that you calculate the probability of such a tree very well but here is a link to Bret’s talk. It is fairly straight forward but it has been a long day! The effect of this is that a “relatively” small sample of trees may actually contain just as much information (when examine in this way) as a much larger traditional posterior distribution when examined in a traditional approach