Today I needed a function to extract all the evolutionary pathways on a phylogentic tree.  Basically all the routes from the root to a tip.  The key to doing this is just understanding the edge element of a phylo object in R.  It is a 2 by X matrix where X is the number of branches in the tree.  The first column is a starting node number and the second column is an ending node number.  With a little thought you can come up with some rules for where different types of nodes will be in this table:

1) the root will only be in the first column (two times assuming strictly bifurcating trees)

2) the tips will only be in the second column (one time each). 

3) interior nodes will be present in both columns (twice in the first and once in the second)

We can confirm this visually with a little plotting:














With these rules in mind we can use a nice little dynamic algorithm to create a vector for each of our evolutionary pathways:

1) pick a tip node as the first element in our vector

2) search the second column of the table to find  the row that matches the last value in our vector

3) add the value in the first column of this row to our vector

4) check if the last value in our vector is the root node; 
        if no return to step 2
        if yes add the vector to a list of vectors

5) check if there are any tip nodes remaining 
        if yes return to step 1

6) for convenience reverse the vectors so they start at the root and go to the tips

7) return the list of pathways to the user




I've added this to the GitHub version of evobiR

If you have installed the latest version then you simply run:

getPaths(tree)


but here is the underlying code that it uses as well:

 getPaths <- function(tree, type) {  
  # extract the table  
  tab <- tree$edge  
  # id the tips - occur in right column but not left  
  tips <- tab[, 2][!tab[, 2] %in% tab[, 1]]  
  # could look at vector of node labels  
  # should be just ntips  
  # id the root - occurs in left column but not right  
  root <- (tab[, 1][!tab[, 1] %in% tab[, 2]])[1]  
  # create a list to store results in  
  paths <- list()  
  # a loop to go through all tips  
  for (i in 1:length(tips)) {  
   # pull the current tip  
   x <- tips[i]  
   # vector to store branches in  
   if(type == "branch") bl <- vector()  
   # checks to see if we found root yet  
   while (x[length(x)] != root) {  
    # which node leads to the most recently sampled node  
    y <- tab[which(x[length(x)] == tab[, 2]), 1]  
    # the row of tab is equivelant to the branch index so  
    # we get our branch id this way  
    if(type == "branch"){  
     bl <- c(bl, which(x[length(x)] == tab[, 2]))  
    }  
    # store our new node  
    x[(length(x) + 1)] <- y  
   }  
   # lets reverse it because I think root to tip  
   if (type == "node") paths[[i]] <- rev(x)  
   if (type == "branch") paths[[i]] <- rev(bl)  
  }  
  return(paths)  
 }  

0

Add a comment

Great Blogs
Great Blogs
About Me
About Me
My Photo
I am broadly interested in the application and development of comparative methods to better understand genome evolution at all scales from nucleotides to chromosomes.
Subscribe
Subscribe
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.