28 August 2014

Recent Publications: Beetle and Otherwise


Haven't had a lot of time for updates to the blog lately.  However, I have a few beetle related publications recently.

1) An article that contains a lot of my dissertation research came out this summer in Genetics.  In this paper I use the karyotypes, specifically sex chromosome morphology, from over 1,000 beetles to make inferences about the mode and tempo of sex-limited chromosome turnover across Coleoptera.  The results of this study inspired the "fragile Y hypothesis".  This hypothesis predicts that the characteristics of meiosis in a clade have a large impact on the rate of Y chromosome turnover and that sexually antagonistic selection reduces the size of the PAR (this is the region of the sex chromosome that forms chiasmata and recombines insuring proper segregation) to such an extent that faithful segregation becomes difficult.  Beetles are an interesting group to test these ideas in because some clades have X and Y chromosomes that do not have a PAR and don’t come together and form chiasmata instead the are held together at a distance by proteins.  While many other species have a more traditional meiosis where a PAR allows for chiasmata insuring proper segregation.  When we compare the rate of Y chromosome turnover in these two groups we find that those with distance pairing sex chromosome are less likely to lose their Y chromosome.  When we look to mammals we see a replication of this pattern suggesting that this is an important aspect of sex chromosome evolution in many clades with heterogametic sex determination.


2) I spent last fall semester at the National Evolutionary Synthesis Center in Durham North Carolina.  During my time there I was working as part of the Tree of Sex Working Group to collect and curate karyotypes and sexual system data for all arthropods.  This work led to over 11,000 records for arthropods which are now part of a database that is approaching 30K records in all across the tree of life.  I also had the chance to make some really cool figures for this paper to help illustrate the distribution of data in our database.





















3) Finally I also had the opportunity to write an article for the Ontario Entomological Society doing a review of sex chromosome and chromosome number evolution across Coleoptera.  This was an enjoyable article to write.  I was able to talk a bit about some of the early pioneers that used beetles to discover important aspects of basic biology in the early 1900s.



07 April 2014

Sliding window analysis


Sliding window analyses are a common approach when examining very large datasets such as genomic data.  A friend is working to find regions of the genome that may show signs of involvement in a complex trait, and part of this analysis required that he calculate the mean value of a measure for each site in the genome.  The figure below shows the goal of such a function.  Using a window size of 4 and a step size of 2:
There may be a ready made way to do this in R, but I was unable to quickly find it so I just went ahead and wrote one up.  First I made some data to work with:


then I wrote a simple function:

and here is the result of running this function and plotting the result with window sizes of 2, 20, 200, and 400.  




Hopefully this will be helpful to someone.

cheers


25 October 2013

Population Genetics Simulator


There are lots of population genetics simulators available as both standalone programs and online offerings.  However, I decided that I wanted to make my own.  So here is my Fisher-Wright model basic population genetics simulator.  I thought it might be a fun way to try out a new package called “shiny”.  Shiny is a package from the developers of RStudio.  It provides an easy way to create interactive web pages that draw on R for generating the displayed content.  These can be run locally by anyone that has the Shiny package installed or you can set up a shiny server and host them as webpages that will use an instance of R on the server to generate the displayed content.  So here is what it looks like in action:



This isn't on the CRAN version of evobiR yet but if you want to play with it you can get it by installing through GitHub.

install.packages("devtools")
library(devtools)
install_github("evobir", username='coleoguy')
library(evobiR)


ViewEvo("wf.model")



cheers!

Here are links to some other useful simulation software:





15 October 2013

evobiR

Over the course of the last couple of years I have written quite a few scripts in R.  These scripts are more than just my own research, a few I have written for friends, others I wrote tying to understand a topic.  So they run the gamut: teaching simulations, posterior predictive simulations, preparing data for other comparative tools, and analyzing sequences.  I've decided that I should collect all of them and create an R package that I can continue to add to as my research progresses.  So the first and rather rough limited version of this is now on github.  If you want to install it from github its pretty straight forward:


install.packages("devtools")
library(devtools)
install_github("evobir", username='coleoguy')
library(evobiR)


for a description of the functions included in the package you can check out my webpage.

cheers

02 September 2013

SPAM: tree ≠ tree

I dont get a lot of spam on my blog and what I do get seems to be quite poorly targeted.  However this morning I laughed out loud when I got this comment on my entry about using R to removes taxa from a phylogenetic "tree"

Action Tree Service has left a new comment on your post "Pruning Taxa from Trees": 

Tree pruning is very important process of tree care service.Tree Pruning removes specific branches or stems to benefit the whole tree.Tree pruning


Happy Labor Day!

25 August 2013

Green Peas and Hamming Distances


The first problem for today is drawn from Mendel's first rule.

Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous,  and n are homozygous recessive.

Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype).  Assume that any two organisms can mate.





The second problem just wants us to calculate the Hamming distance between two aligned DNA sequences.  This is a really easy problem my answer is embarrassingly long but I didn't really feel like optimizing it since it worked right off.


Given: Two DNA strings s and t of equal length (not exceeding 1 kbp).

Return: The Hamming distance dH(s,t).




24 August 2013

NESCent Fellowship



I've just started a grad fellowship at NESCent.  My main goal for my time here is to produce papers that utilize data from the tree of sex data that I have been helping to collect.  However, I do have a few ancillary goals for my time here one of which is to become a better programmer.  I've asked a few people (way smarter than me) over the last couple of years how to become really good in a particular language.  The most frequent response that I have received is to just do "something" each and every day in whatever language it is that you are working on.  To that end I am supplementing my normal programming with the challenges at Rosalind.  This is a great website that has all kinds of programming challenges that you can attempt to solve.  The site seems to have a python focus, buts lots of members solve problems with every language you can think of.  I’ve decided that I am going to start at the beginning of the bioinformatics challenges and try and solve about a problem a day in R and in Python.  The first ones are really simple so I did three in R today.

Challenge 1: A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.  An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."
Given: A DNA string s of length at most 1000 nt.
Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in s.



Challenge 2: An RNA string is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'. Given a DNA string t corresponding to a coding strand, its transcribed RNA string u is formed by replacing all occurrences of 'T' in t with 'U' in u.
Given: A DNA string t having length at most 1000 nt.
Return: The transcribed RNA string of t.



Challenge 3: In DNA strings, symbols 'A' and 'T' are complements of each other, as are 'C' and 'G'.  The reverse complement of a DNA string s is the string sc formed by reversing the symbols of s, then taking the complement of each symbol (e.g., the reverse complement of "GTCA" is "TGAC").
Given: A DNA string s of length at most 1000 bp.
Return: The reverse complement sc of s.

cheers from NESCent