I am in Austin this weekend for the
Phylogenomics and metagenomics symposium and workshop organized by
Tandy Warnow. Today was the symposium tomorrow is the
workshop. Lots of really interesting and
exciting methods were presented today.
Prior to getting here I felt that the thing that I was most interested
in was simultaneous estimation of trees and alignments – SATe type tools - these were really cool and interesting. Although two speakers really grabbed my
attention with ideas involving “importance sampling”.
Fair warning: Anything that doesn’t make sense below is
certainly due to my own misunderstanding.
The first to do this was
Mark Holder. He was discussing stepping stone sampling to
estimate the marginal likelihood of a model.
Part of this process involved getting trees that are similar to those in
the posterior but were not actually sampled.
The challenge here is to do this in an informed way were you use the
info in the posterior to inform your choice of trees. So the way this works is that you can look at
a consensus tree and you will have probabilities of each edge in your
consensus. To produce new trees just
work your way through your consensus tree retaining edges in proportion to
their posterior probability. This will
leave you with a bunch of multifurcations that need to be resolved in any way
other than the way they are found in the consensus tree. This process allows you to produce a new
sample of trees which will be centered on the posterior distribution but spread
out.
The second was
Bret Larget.
So Bret’s idea is that because subtrees are “approximately” independent
of one another you can use a sample of trees to estimate the probability of
trees that were never sampled in the tree search process. Furthermore this process can really be used
to estimate a true probability of a tree rather than simply considering its
probability to be equal to its frequency in the posterior sample produced by
the mcmc sampler. I cant explain the way
that you calculate the probability of such a tree very well but here is a link
to
Bret’s talk. It is fairly straight
forward but it has been a long day! The
effect of this is that a “relatively” small sample of trees may actually
contain just as much information (when examine in this way) as a much larger
traditional posterior distribution when examined in a traditional approach
Add a comment