A couple of years ago I posted a bit of code that lets you plot data in a ternary diagram.  This is a useful way to display proportion data when proportions are split between three possible states.  Unfortunately, my original post included a link to the underlying data in my dropbox account.  This has led to me accidentally deleting it a couple of times.

Well I did it again.  I decided that the best solution would be to just post some new code that included simulation of the data.

The original data was generated under a phylogenetic model and the data represented the number of species in each of 3 possible states.  If you would like to regenerate data for that approach you will need an R package that allows for discrete data simulation on a phylogenetic tree.  A fairly easy to install package that can do this is geiger.  Below is the code for generating the an appropriate dataset.

 ## phylogenetic approach to simulating data  
 library(geiger)  
 tree <- sim.bdtree(b=1, d=.4, stop=c("taxa", "time"), n=200, seed=0, extinct=T)  
 tree <- drop.extinct(tree)  
 q <- list(rbind(c(-.06, .05, .01), c(.09, -.1, .01), c(.01, .01, -.02)))  
 simresult <- matrix(1,200,4)  
 for(i in 1:200){  
  sim.root <- sample(1:3, prob=c(.55,.35,.1), size=1)  
  phylosim <- sim.char(phy=tree, model="discrete", par=q,   
             root=sim.root)  
  simresult[i,2] <- sum(phylosim == 1)  
  simresult[i,3] <- sum(phylosim == 2)  
  simresult[i,4] <- sum(phylosim == 3)  
  simresult[i,1] <- sim.root + 1  
 }  
 simresult[1,1] <- 1  
 rm(i, phylosim,q,sim.root,tree)  


If you are not interested in phylogenetics you can generate an appropriate dataset in a non-phylogenetic framework like this:


 ## non-phylogenetic approach to simulating data  
 simresult <- matrix(1,200,4)  
 set.seed(4)  
 simresult[,2] <- sample(1:498, size=200)  
 for(i in 1:200){  
  simresult[i,3] <- sample(1:(499 - simresult[i,2]), size=1)  
  simresult[i,4] <- sample(1:(500 - (simresult[i,2] + simresult[i,3])), size=1)  
  simresult[i,1] <- sample(2:4, size=1, prob = c(simresult[i,2],   
                          simresult[i,3],   
                          500 - (simresult[i,2] + simresult[i,3])))  
 }  
 simresult[1,1] <- 1  


Finally with either of these datasets you can then plot using the vcd package.

 ## plotting simulated data  
 require("vcd")  
 colnames(simresult) <- c("Root", "XY", "XO", "Xyp")  
 simresult <- as.data.frame(simresult)  
 colors <- c("black","red","green", "orange")  #This sets up a vector of colors  
 ternaryplot(                  #This is the actual plotting of our data  
  simresult[,2:4],               #Here I provide the file and columns to plot  
  pch = 20,                   #This is choosing the shape of data points (simple circle here)  
  cex = .5,                   #This is the size of the data points  
  col = colors[as.numeric(simresult$Root)],   #Telling it to color the points  
  main = "PPS - 3-State Model"         #Provides a title for the graph  
 )  
 grid_legend(x="topright", pch=20, col=colors, labels=c("Observed", "XY", "XO", "Xyp"),title = "Root States")  #Sets up a legend  

This script will then produce:


cheers


0

Add a comment

Great Blogs
Great Blogs
About Me
About Me
My Photo
I am broadly interested in the application and development of comparative methods to better understand genome evolution at all scales from nucleotides to chromosomes.
Subscribe
Subscribe
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.