27 August 2015

Chromosome Evolution Conference


I had the good fortune of speaking at the American Genetics Association president’s symposium last week.  There were many things to enjoy at this conference.  First the location was Bainbridge Island just across the bay from downtown Seattle.  Second, despite a striking diversity of particular subfields all speakers shared a broad interest in chromosomal evolution and were interested in sharing ideas and approaches and finding ways to move the field forward.  There was also relative equality in the distribution of sexes with 10 female and 12 male speakers.  Likewise the poster session featured 14 female and 16 male presenters.  Finally, if the talks and posters weren’t sufficient then the opportunity to socialize and discuss science certainly were.  All meals were catered and there were no concurrent sessions.  This meant that you got to spend 2 ½ days sharing ideas and learning from each other without ever missing a talk or poster.  So many thanks to Katie Piechel for putting together such an awesome conference!  Below are a few highlights that I found particularly exciting.

Jeremy Searle from Cornell gave a great talk covering hybrid zones and speciation in mice and shrews.  He spoke quite a bit about the amazing variation in chromosome number that he has found on the island of Madeira.  This hit home for me because in my analysis of chromosome number it ends up that some of the fastest rates that I have observed are driven largely by the amazing diversity that I observe in species from the Canary islands just south of Madeira.  Its always reassuring when you see the same kind of patterns in such distant parts of the tree of life.

Karen Miga from University of California, Santa Cruz discussed her work trying to shed light on the dark side of the genome.  To be honest I’m not sure how to use the work that she has done but I think that her work represents a huge step forward in giving us some usable characterization of the nature of centromeric regions of chromosomes.  For a lay introduction to her work you can check out her blog post at Scientific American or if you're feeling a bit braver here is a nice article on a preprint server.

Beatriz Vicoso who recently took a position at the Institute of Science and Technology in Austria talked about some of the amazing work that she did while she was a postdoc with Doris Bachtrog.  Specifically the suprising discovery that heteromorphic sex chromosomes are not always a trap.  Indeed many of the mysterious attributes of the dot chromosome in Drosophila now make sense once we realize that it spent millions of years as a sex chromosome.  To read about this check out their article in PLOS Biology.

I could and should add another 10 or 12 talks and posters but I probably need to focus on my own papers instead.  Keep your eyes out for next years AGA president’s symposium in Asillomar; it will focus on local adaptation and will be organized by Lynda Delph.

Hadley and Winston talk about the future of interactive graphics

I went to my first R users group meeting in Minnesota and was treated to talks by two R heavy weights. First Hadley Wickham gave a talk about what he views as the most exciting and dynamic area of R development (interactive graphics) and this was followed by a short talk by Winston Chang showing what he has been doing to develop interactive graphics a part of the Shiny environment.  Below are some take home messages and a few links that I thought were particularly interesting.  

Hadley Wickham: First the slides for Hadley’s talk are on his github page.  Hadley started his talk off with a short demo showing how interactive graphics could allow you to learn about your data.  He has a clever dataset of 3d data points and these show no apparent pattern when viewed along any two axes.  However when viewed from the correct angle we discover that there is a very distinct spiral pattern in the data this provides a nice aha moment in his talk.  Next Hadley discussed the history of interactive graphics in R splitting them into 3 categories 1) those written in lower level languages, 2) those hacking existing R plotting functions, and 3) and browser based solutions.  Many of the packages that he showed are not very realistic for everyday use or for investing a lot of time since they are no longer being actively developed.

Perhaps the most interesting part of Hadley’s talk was his discussion and demonstration of ggvis this is Hadley’s package that he envisions eventually replacing ggplot2.  His goal is to continue to work on this and perhaps sometime in 2016 have it to the point that anything you could do in ggplot2 you can do in ggvis.  The upside of this is that if you are already comfortable with ggplot2 you will have no trouble transitioning into ggvis.  Hadley is using the same type of grammer based approach to building up more complex graphs from simpler elements.

Winston Chang gave a much shorter talk but illustrated some of the new interactive developments in Shiny.  Despite my use of Shiny I was actually completely unaware of this development.  Winston has added some nice examples on the Shiny articles page (bottom of right column).

Other interesting links from the evening:

Html widgets for R: allows you to use a variety of javascript visualization libraries within R

Hadley’s new book Advanced R

21 April 2015

Beetle behavior - do they avoid, aggregate, or simply wander

I have a student this semester who was interested in determining whether a group of beetles in a confined space (petri dish) avoided each other, aggregated in groups, or perhaps simply wandered randomly.  I don’t know of an existing statistical test to determine the answer to this question.   However, it seemed that perhaps the mean pairwise distance between all beetles might be a good measure of where an organism falls along this continuum.  For example if we have a square grid that is 10 units on a side and we put one beetle roughly in each corner:

The pairwise measures are:
The average of these measures is 10.24

In contrast if we have the same grid but the beetles have aggregated into two clumps:

The pairwise measures would now be:
The average of these measures is 8.48

Spread out individuals give us a larger number clumping/aggregation leads to a smaller number - this is what we are after.  Once she collected her data we needed to create a null distribution of expected pairwise distances assuming that the beetles were randomly distributed through the available space.  To do this we simply need to create the same number of random X and Y coordinates (making sure they fall in the circle of our petri dish) as she had in her experiment. Once we have a set of random coordinates we can then measure all the pairwise distances.  We repeat this process thousands of times to generate a null distribution of what we expect to see.  Here is what the pseudocode for this looks like

Radius of circle = 
Number of beetles =
Iterations =
Observed pairwise mean =

Loop for iterations
-Loop for number of beetles
--While coordinate not found
---Generate a coordinate
---Test coordinate for falling inside circle if true save
--Calculate mean of all pairwise distances and record
Plot null distribution and observed value

Below is the actual R code that I used to accomplish this.  I decided to tweak it a bit and turn it into a shiny app that allows her to run it through the web thanks the free shiny apps server.  The fully implemented version is here

 # This code produces pairwise measurement for  
 # randomly placed points in a circle of arbitrary size  
 # Heath Blackmon & Elizabeth Mallory  
 # 20 April 2015  
 RandomDist <- function(radius=10,   
  # this internal function just calculates a pairwise distance  
  pw.dist <- function(x1, x2, y1, y2){  
   sqrt(((x1 - x2) ^ 2) + ((y1 - y2) ^ 2))  
  # this function tests whether a point fall inside (or on)  
  # a circle of given radius  
  in.circle <- function(x, y, radius){  
   (x - radius)^2 + (y - radius)^2 < radius^2  
  # we will store our final null dist here  
  final.results <- vector()  
  # this loop will repeat till iterations is reached  
  for(k in 1:iterations){  
   # this print statement allows us to monitor progress  
   if(k/10 == round(k/10)) print(paste("iteration", k))  
   # we will store our coordinates in this dataframe  
   coords <- data.frame(x=0,y=0)  
   # this will loop through for the number of individuals  
   for(m in 1:individuals){  
    # this flag will be false until we generate a   
    # valid coordinate  
    inside <- F  
    # we generate the coordinates in this while statement  
    while(inside == F){  
     # pick coordinates from uniform distribution  
     t.coords <- runif(n=2, min = 0, max = radius*2)  
     # now test if they are in circle  
     if(in.circle(x=t.coords[1], y=t.coords[2], radius)){  
      inside <- T  
    coords[m, 1:2] <- t.coords  
   colnames(coords) <- c("X", "Y")  
   # this calculates the cumulative sum of the number of   
   # individuals - 1 (this is equivelant to the number of   
   # pairwise measures)  
   pairnum <- tail(cumsum((individuals - 1):1), n=1)  
   pairmeas <- vector()  
   counter <- 2  
   # this loop is going to do the pairwise measures  
   for(i in 1:(individuals-1)){  
    for(j in counter:individuals){  
     # here we do an actual pairwise calculation  
     x <- pw.dist(coords[i,1],  
     pairmeas <- c(pairmeas, x)  
    counter <- counter + 1  
   # store result from each iteration  
   final.results[k] <- mean(pairmeas)  
  # calculate emperical p-value  
  lower <- sum(final.results>observed)  
  higher <- sum(final.results<observed)  
  if(lower < higher){  
   pval <- lower/iterations  
   pval <- higher/iterations  
  # plot the results  
     main = "Distribution of Mean Pairwise Measures",   
     xlab = "mean distance",  
     ylab = "density",  
  abline(v=observed, col="red", lwd=2)  
  # pull in the plot limits  
  mylims <- par("usr")  
     paste("pvalue =", pval),  

17 April 2015

R: concatenate alignments into a supermatrix

Today a friend asked me the easiest way to concatenate several hundred fasta format alignments that include some differences in the species present in each alignment into a single supermatrix. For example if we had two alignments like this:

>species A
>species C

>species A
>species B

we would want to produce a final alignment that looks like this:

>species A
atcgatcgtagctcgtacgtcgtacg---act cgt---acgtcgtactgtc
>species B
-------------------------------- cgtcctacgtcgtactgtc
>species C

We would also want to keep track of the data partitions that have been combined so we would want a report like this as well:

Locus 1 1:33
Locus 2 34:52

I assumed that this would be a simple task in R – surely somebody has a function for this.  However, after checking some of the big packages like APE I found that there wasn’t an obvious and easy way to do this.  For instance in ape while there is c method for dna object it expects them to have the same taxa and it drops any taxa not in all alignments.  I am in the process of writing a manuscript for my package evobiR so I decided I should write a function for this. Here is the code that I came up with.

 SuperMatrix <- function(missing = "-",  
             prefix = "concatenated",  
             save = T){  
  # get file names  
  file.names <- list.files()  
  # read DNA  
  DNA <- list()  
  for(i in 1:length(file.names)){  
   print(paste("Reading alignment", i))  
   DNA[[i]] <- read.dna(file=file.names[i],   
              format = "f",   
  # calculate total alignment length  
  total.length <- 0  
  for(i in 1:length(DNA)){  
   total.length <- total.length + ncol(DNA[[i]])  
  # get all taxa names  
  taxa <- vector()  
  for(i in 1:length(DNA)){  
   counter <- length(taxa) + 1  
   for(j in 1:nrow(DNA[[i]])){  
    taxa <- c(taxa,   
         row.names(DNA[[i]])[!row.names(DNA[[i]]) %in% taxa])  
  # create empty alignment  
  seqmatrix <- matrix(as.character(missing),  
  rownames(seqmatrix) <- taxa  
  # create partition record holder  
  partitions <- as.data.frame(matrix(,length(DNA),3))  
  colnames(partitions) <- c("part", "start", "stop")  
  #build the supermatrix  
  c.col <- 0  
  print("Creating supermatrix")  
  for(i in 1:length(DNA)){  
   print(paste("Processing alignment", i))  
   gene <- DNA[[i]]  
   for(j in 1:nrow(gene)){  
    c.row <- which(rownames(seqmatrix) == rownames(gene)[j])  
    seqmatrix[c.row, (c.col + 1):(c.col + ncol(gene))] <- gene[j, ]  
   partitions[i,1:3] <- c(file.names[i], c.col + 1, c.col + ncol(gene))  
   c.col <- c.col + ncol(gene)  
  results <- list()  
  results[[1]] <- partitions  
  results[[2]] <- seqmatrix  
  if(save == T){  
   print("saving files")  
        file = paste(prefix, ".fasta", sep = ""),   
        format = "f")  
        row.names = F,   
        file = paste(prefix, ".partitions.csv", sep = ""))  

While I haven’t done anything to try and make this very fast testing it on my friends radseq data it was able to construct a supermatrix for 18 taxa and 200,000 bp in just a couple of minutes so probably is already fast enough for most purposes.  This function has already been loaded into the current version of evobiR available from github If you find this useful or see a potential problem please let me know.


28 August 2014

Recent Publications: Beetle and Otherwise

Haven't had a lot of time for updates to the blog lately.  However, I have a few beetle related publications recently.

1) An article that contains a lot of my dissertation research came out this summer in Genetics.  In this paper I use the karyotypes, specifically sex chromosome morphology, from over 1,000 beetles to make inferences about the mode and tempo of sex-limited chromosome turnover across Coleoptera.  The results of this study inspired the "fragile Y hypothesis".  This hypothesis predicts that the characteristics of meiosis in a clade have a large impact on the rate of Y chromosome turnover and that sexually antagonistic selection reduces the size of the PAR (this is the region of the sex chromosome that forms chiasmata and recombines insuring proper segregation) to such an extent that faithful segregation becomes difficult.  Beetles are an interesting group to test these ideas in because some clades have X and Y chromosomes that do not have a PAR and don’t come together and form chiasmata instead the are held together at a distance by proteins.  While many other species have a more traditional meiosis where a PAR allows for chiasmata insuring proper segregation.  When we compare the rate of Y chromosome turnover in these two groups we find that those with distance pairing sex chromosome are less likely to lose their Y chromosome.  When we look to mammals we see a replication of this pattern suggesting that this is an important aspect of sex chromosome evolution in many clades with heterogametic sex determination.

2) I spent last fall semester at the National Evolutionary Synthesis Center in Durham North Carolina.  During my time there I was working as part of the Tree of Sex Working Group to collect and curate karyotypes and sexual system data for all arthropods.  This work led to over 11,000 records for arthropods which are now part of a database that is approaching 30K records in all across the tree of life.  I also had the chance to make some really cool figures for this paper to help illustrate the distribution of data in our database.

3) Finally I also had the opportunity to write an article for the Ontario Entomological Society doing a review of sex chromosome and chromosome number evolution across Coleoptera.  This was an enjoyable article to write.  I was able to talk a bit about some of the early pioneers that used beetles to discover important aspects of basic biology in the early 1900s.