06 February 2016

Software Carpentry Round 1

Last week I taught at my first software carpentry course.  I was one of 6 instructors for this course (all first time SWC instructors).  My teaching focused on intermediate R applications (documents and slides).  Lots of things went really well and lots of things went really poorly.  On the whole students were positive and felt they had learned valuable skills that would assist them in their future work.  That being said, I want to document at least a few lessons learned

1) Lower expectations:  A number of us (instructors) over estimated how quickly we could cover material.  The sections of my own lesson that went best were the ones where I had two final challenges and could drop one of them if time ran short.

2) Course focus:  With SWC you are locked into a pretty limited set of topics.  However, most of my teaching is outside of SWC and for the second time feedback indicates that most of my students (grad students with a few postdocs or undergrads mixed in) value learning and improving plotting skills more than anything else.

3) Use the post-its better:  The instructors and helpers did a good job of responding to red flags.  However, the green flags could be used much better.  For instance, at the beginning of a live coding session make sure no post-its (red or green) are up.  Then at some point into the process you can ask everyone who is at the same point to put the green flag up.  This will catch those students that hesitate to use the red.  Then you can get them help and they stay involved.

4) Make a schedule of which instructor will be responsible for the etherpad during each session.  We did a fair job but we did have a couple of times when no instructor noticed that a question popped up in the chat.

5) Have at least 2 instructors plan on staying for an extra hour to help students that struggled on either day.

6) It is apparently impossible to get all students to install software ahead of time.  This is a consensus from every bioinformatics and workshop I’ve attended combined.

7) Finally, mistakes aren’t the end of the world.  We all hit error here and there in our live coding.  Some of these none of the instructors could solve right away.  Even if there was a delay we eventually showed the students how to fix the errors, and the students seemed to like this.

8) Enjoy the fun/funny parts like this crazy feedback I got:

31 December 2015

366 manuscripts

Last night there was a bit of a discussion on twitter about whether or not reading 365 papers was a laudable, realistic, or even useful new years resolution.  To be completely honest I’m not sure how much of a departure from my normal routine this would be.  I tell new grad students all the time that they should be reading literature more than they are.  I often recommend that they have a part of every day reserved for reading – this is what I do.  However, I know that most of my reading is not consuming an entire paper.  I like to think that I get the important bits as I look at figures and scan the discussion and results, but I also know that many times I am done with a paper just by reading the abstract.

If I had to make an informed estimate, I would say that reading a paper a day would not be any increase over my baseline.  So my new years resolution is to keep track of papers that I read thoroughly, no promises on the final number.  I’m not committing to reading every page either, but I will only count a paper if I feel like I have engaged enough to evaluate the science being presented.   To keep the overhead low I am going to update this post by adding the citations and 1-2 sentences of my thoughts for each paper.

1) Sved, John A., Yizhou Chen, Deborah Shearman, Marianne Frommer, A. Stuart Gilchrist, and William B. Sherwin. "Extraordinary conservation of entire chromosomes in insects over long evolutionary periods." Evolution (2015).
- no new chromosomes in drosophila or tephritid flies
- possibly due to lack of telomerase
- suggest that this effect occurs across all diptera but cytogenetic evidence doesn't seem to support this and is not discussed.
2) Rovatsos, Michail, Jasna Vukić, Petros Lymberakis, and Lukáš Kratochvíl. "Evolutionary stability of sex chromosomes in snakes." In Proc. R. Soc. B, vol. 282, no. 1821, p. 20151992. The Royal Society, 2015.
- uses qPCR to look for a set of sex linked and autosomal genes across ~40 species.
- find that caenophidea all have the same genes sex linked on an X (sex specific portion)
- these genes are not sex linked in other groups of squamates that have differentiated sex chromosomes
- likely would miss any additions would have to look at many genes and still wouldn't catch unless the Y copy had been lost

3) Hahn, Matthew W., and Luay Nakhleh. "Irrational exuberance for resolved species trees." Evolution (2015).
- commentary suggesting that we shouldn't be trying to look at trait evolution with a single tree
- conflict among gene trees is telling us something possibly interesting and revealing
- bootstrap values are kinda pointless when it comes to phylogenomic datasets
- seems a bit pessimistic about the ability to overcome this problem by using Bayesian methods and modeling our trait evolution across a sample of trees.

4) Rundle, Howard D., and Michael C. Whitlock. "A genetic interpretation of ecologically dependent isolation." Evolution 55.1 (2001): 198-201.
- extends the lynch 1991 model (all autosomal loci approx. LCA) to include environmental dependent genetic effects.  Lots of ways to extend this in SAGA.

 5) Hangartner, Sandra, Anssi Laurila, and K. Räsänen. "The quantitative genetic basis of adaptive divergence in the moor frog (Rana arvalis) and its implications for gene flow." Journal of evolutionary biology 25.8 (2012): 1587-1599.
- Environmental dependent maternal and additive genetic effects contribute to divergence in trait (pH tolerance)  Uses basic Rundle and Whitlock model above.

6) Edmands, Suzanne, and Julie K. Deimler. "Local adaptation, intrinsic coadaptation and the effects of environmental stress on interpopulation hybrids in the copepod Tigriopus californicus." Journal of Experimental Marine Biology and Ecology 303.2 (2004): 183-196.
- Shows no evidence of local adaptation but strong sugestion of coadaptation (coadapted gene complexes)
- Overwhelming pattern is F1s more fit F2s less fit - all relative to P1 and P2.  So think this is likely dominance x dominance epistasis?  Unfortunately data not availabe. No indication of local adaptation.

7) Erickson, David L., and Charles B. Fenster. "Intraspecific hybridization and the recovery of fitness in the native legume Chamaecrista fasciculata." Evolution 60.2 (2006): 225-233.
- F1 plants have low fitness but much of this is recovered by the F6
- Use an approach from Lynch and Walsh 1998 that sets up an expected phenotype measure accounting for addative and dominance and compares hybrids to this to test for role of epistasis.
- Not sure that I buy this as sold at the end - this as a source of adaptive variation that could favor homoploid hybrid speciation...

8) Egan, Scott P., and Daniel J. Funk. "Ecologically dependent postmating isolation between sympatric host forms of Neochlamisus bebbianae leaf beetles." Proceedings of the National Academy of Sciences 106.46 (2009): 19426-19431.
- Shows a strong pattern consistent with local adaptation for two races of leaf beetles.
- Would be perfect dataset for extending the Rundle and Whitlock 2001 model with SAGA

Writing these little notes up sucks! Just takes away time from additional reading/writing.   Thinking that I will focus on this instead:


17 October 2015

U of M EEB seminar - Archie

I’m going to keep this weeks post short - grant and code writing are hanging over my head as I type this.

The seminar this week was given by Elizabeth Archie from the University of Notre Dame. Her talk title was Social relationships health and fitness in wild baboons the portion of her research that she shared with us today focused on how social behavior contributes to health. Her work in this area has focused on baboons. She studies two social groups in Amboseli national park in Kenya. These groups have many similarities to humans for instance they are terrestrial and contain multiple male and female adults.

There are some aspect of social behavior that have very obvious consequences for health. For instance we know that social lifestyles can lead to greater risk of harmful infections. However, we understand less about the possible positive effects. Do social interactions contribute to differences in the gut microbiome? Do these changes have positive or negative net affect on health and longevity?

A first step in understanding this might be to try and understand what determines the character of the gut microbiome. We know that social partners have more similar gut microbiomes than nonsocial partners but we don’t know exactly why. Social partners often share diets, genetic ancestry, and habitat are any of these key in determining the microbiome?

Archie’s group has begun to unravel this through an ambitious project using metagenome shotgun sequencing of fecal samples. They sampled 48 adults from two groups over a one month period. The two groups were adjacent and each had a range of around 12.5 square miles during this month. This is an awesome dataset that allows lots of really interesting questions to be asked:

  1. Does social group explain differences in microbiome?
  2. Do social networks within groups explain differences in microbiome?
  3. Are all microbe groups equally influenced by social behavior?

So to test this first question we can look at a simple PCA and ask if differences in the taxonomic groups or genes present in the shotgun samples are able to differentiate the two groups. Indeed we see that the PCA shows that we can easily separate these two groups. Social group membership explains about 18% of difference in the taxonomic makeup of the gut microbiome, and about 10% of the differences in the genes present in the gut microbiome. I was curious about this difference 18% to 10% does this have important implications? To me this suggests that regardless of the species mix we have in our gut microbiome we converge on a community that is adapted to accomplish similar metabolic goals. Not sure exactly what the appropriate test for that would be, and this is probably already well established (not my subfield).

this is a screen shot of a figure from the 2015 elife paper on the left we have PCA based on taxonomy included in the sequences on the right based on KEGG enzyme orthology (type of genes). Mica and Viola  are the names for the two social groups being studied.

Now for question two. Do differences in within group social behavior have predictable contributions to gut microbiome content? This is where the nature of this dataset gets cool. It ends up that these 48 baboons that were sequenced have also been observed and their social interactions have been extensively documented. This results in pairwise measures of social connection between all individuals in a population. They have great looking figures of this (I’m not sure how much quantitative information an average user extracts from these - could the position of the figures be optimized to minimize crossing lines? - perhaps they are?).

This is a screen shot of a figure from the 2015 elife paper. The thickness of the line indicates the strength of social connection between individuals.  Social connections are measured based on grooming between pairs - Elizabeth tells us that this is a good measure of social connection since it is the primary activity used to build and maintain friendships in these baboons.

What we see (in both social groups) is that the strength of social interaction between two individuals is corelated with the similarity of their gut microbiomes.
Now for that final question, is it certain microbes that are producing this signal. We could easily imagine that many microbes are ubiquitous and widely available from the environment. Perhaps it is only a subset (fragile ones that don’t last long in the “wild”) that we see social behavior as the determining factor in their presence or absence. The approach that Archie’s group took was to see if the same taxonomic groups show enrichement between the two main groups and within the social network of one group. They found that indeed there is a subset of taxonomic groups that do seem to be prone to the effects of social behavior. To take this a step further the authors turned to the Genomes Online Database to ask if the identified groups were prone to be non spore forming species that might not persist outside of a host. This is indeed what the authors found the identified groups were “consistently enriched (relative to all species or genera tested) for an anaerobic, non-spore forming lifestyle”

This one only the first half of Beth’s talk she also shared fascinating work that her group has done looking at the effect of early life hardship on longevity. You can find links to this and some of the groups other fascinating work below.

The Archie Lab

Elizabeth A. Archie Google Scholar Profile

Manuscripts resulting from the projects mentioned above:

Social networks predict gut microbiome composition in wild baboons.

Social affiliation matters: both same-sex and opposite-sex relationships predict survival in wild female baboons.

09 October 2015

U of M EEB seminar - Castoe

Today’s seminar speaker was Todd Castoe from the University of Texas at Arlington where I did my Ph.D.  While Todd wasn’t on my committee my office when I was a grad student was directly across the hall from Todd, so it was great to catch up with him.  His talk was titled: Genome-wide evidence for perturbed systems as hotspots for adaptation.  Writ large this was a talk about convergence and the topography of the adaptive landscape.

The idea of adaptive landscapes is perhaps one of the oldest metaphors in evolutionary biology.  For Todd’s talk we will consider the high points in these landscapes as phenotypic combinations that provide high fitness and the pathway followed to travel between peaks as the evolutionary changes responsible for moving between peaks. 

Two alternative adaptive landscapes:

Here the adaptive landscape has many peaks and there are many potential pathways to reach any of these many peaks

In contrast there could be only one reasonable path between only a handful of local optima

We generate expectations for convergence under each of the above conceptions of the adaptive landscape.  In the first we would expect convergence at the phenotypic level to be relatively more common than the actual underlying molecular changes responsible for the phenotypes.  In contrast in the second case some form of constraint limits the pathways that can be used to move from one optima to another and we should expect phenotypic and molecular convergence to be more common.

Todd began his talk by suggesting that increasing evidence (including a great example of mitochondrial convergence in squamate reptiles) points to an adaptive landscape that is highly constrained and more like the second metaphorical picture.

Todd then showed us a variety of projects that his lab is working on that may shed light on the nature of this landscape and the nature of convergence.  For the sake of brevity I’m going to talk about just one of these. 

This project is a story centered on the amazing physiology of the Burmese python.  In its natural habitat the Burmese python often feeds very rarely but consumes meals that may mass as much as 50% of the snake’s mass.  After these large feedings the snake may fast for many months.  This pattern of large meals followed by long fasting is not unique to Burmese pythons and is also present in some of the large vipers.  The amazing part of this story though is the energy saving solution these large snakes have found.   During fasting periods the snakes have evolved to lose much of the physiological structure necessary for the digestion of these meals.  The snakes exhibit massive losses in the size of organs for instance the heart, liver, and kidneys, and they even lose some structure within organs.  For instance, microvilli in the small intestine are lost during fasting periods.  These adaptations allow the snake to achieve some of the lowest metabolic rates measured in vertebrates during fasting periods, but return to more typical metabolic levels after feeding.  The phylogenetic distribution of these traits suggests that some of the necessary machinery for this physiological remodeling is likely quite ancient and has been fine tuned or loss in contemporary clades.

The Castoe lab is now doing exciting work looking at the evolution of introduced Burmese python populations in Florida.  There are many differences in the environment that the invasive populations are living in.  The obvious and well documented difference is low temperatures.  For instance winter mortality appears to be as high as 40-90% in the invasive population.  Because of this it was expected that genome scans would reveal selection on genes important in cold tolerance.  Genome scans revealed that approximately 80 genes show strong signs of selection in this invasive population.  But, genes important in cold tolerance only accounted for a fraction (approximately 10%) of these 80.  What did the rest of the genes under selection look like?  Preliminary results suggest that the majority are those genes that have already been identified as being important in the evolution of physiological remodeling.  Why would these genes suddenly come under selection?  It ends up that in Florida prey items are abundant year round and snakes that do not remodel their organs may have higher fitness.  The story then may be a particularly fascinating version of convergent evolution where this introduced population is converging on a phenotype of its ancestor.  Returning to the analogy of adaptive landscapes at the beginning of the post then we would be seeing something like this:

  When we look at the Burmese python in Florida we are seeing a lineage that has used largely the same system of genes to move from no physiological remodeling to physiological remodeling and now back again… effectively converging on an ancestral phenotype and using the same pathway to do it!

Todd covered an impressive body of work in his talk and any mistakes or unjustified extrapolations are likely my own.  Below are links to some of the papers from his lab that focus on these topics:

30 September 2015

U of M - EEB seminar - Lockwood

Today we had an interesting seminar with Julie Lockwood from Rutgers University.  Her talk title was Killing the cuddly: Tactics for managing exotic predators to protect their native prey.   Her talk was divided into two sections the first dealt with the importance of model choice in providing guidance to managers and the second part dealt with attempts to understand population growth in invasive species.   

Importance of modeling
She motivated the first part of her talk with an empirical example from Rayner et al 2007.  This manuscript looks at the breeding success of Cook’s petrels under three conditions first prior to any work being done to remove predators so that both cats and rats were present (success=.25-.50), then after cats had been removed but rats were still present (success=0.0-.25), and then finally once both had been removed (success=.50-.75).  Here is the graph that she showed from that paper:

She then explained that most model informed management decisions that are currently made use models that only examine the predator population and look for the life stage that can be targeted to most quickly reduce its population.   She suggests instead we need to take a more nuanced approach with a more complex model and use the target of keeping the prey alive rather than dropping the population of predator as quickly as possible.

She then described work that developed a model that allowed the predator to effect the prey population size.  This model can accommodate biological realities such as the fact that often juvenile predators can’t eat adult prey or that only certain stages of prey are targeted at all.  Once this model was developed they did a series of full block experiments with short, medium, or long lived predators and prey for a total of 9 pairings.  The general results of this were that particularly in the case of long lived predators traditional approaches and her recommended approach found different optimal solutions.

- my biggest question from this part of the talk was: Can we wrap in two predators and get the "right" solution to the motivating example of Cook's petrels?

Population Lag
The next part of her talk was focused on population lag.  This is a well documented characteristic particularly in invasive plants.  It is characterized by a species being present at low density for an extended period of time and then at some point suddenly exhibiting a great increase in population size.   In plants it appears that population lag can range from as little as a decade to as long as a century or more.  The data available for vertebrates though was less clear.  Primarily because there are not large numbers of vertebrates that have been introduced with abundant data on population size over time.  However, Hawaiian birds offer the opportunity to study this phenomenon.  Her team used the Audubon Christmas bird count data to look at population dynamics of 54 invasive species in Hawaii.

I’ll be honest this part of the talk got a little fuzzy for me.  The data are very noisy and so apparently some method was used to define a maximum population size and data after this point were discarded (I believe).  The remaining data were fit with a number of models 1) simple linear, 2) log 3) two piece linear.  For the two piece linear basically every point between the 6th year and the 5th year prior to peak population was tried as the break and the break that resulted in the highest likelihood was kept.  If the single break model was better than the alternatives then that was evidence for population lag in that species.  Her results indicated that most species did exhibit a population lag. 

- my biggest question/comment from this part of the talk was: Seems like one of those places where we need to develop or assess model adequacy, I could picture the best model here being too poor to reveal biologically important characteristics.

All in all this was a really interesting talk.  My final take home from the talk was that most invasive species provide us with a long period of time when population grows slowly and during this time careful modeling can provide us with the best chance of having a successful attempt in controlling an invasive species.  In addition even if we miss this low population window populations are stochastic and so perhaps we should be prepared to attack an invasive when it experiences a natural fluctuation.

Here are links to the Lockwood lab and the pages of her two former students who did much of this work.  My explanations above are no doubt over simplifications and contain errors which are purely my own.  If this stuff really interests you check out the publications below which contain the research I described above.

Kevin Aagaard

Orin Robinson


25 September 2015

Brownian Motion

If you want to skip all the code and Shiny info you can jump straight to the shiny app: Brownian motion simulator.

In comparative phylogenetics Brownian motion is often either a null model that we compare observations to or is a de facto assumption that our methods depend on.  For this reason, I think it can be useful to have nice interactive ways for students to explore this model and develop an intuitive understanding of how it behaves.  The R package Shiny offers a great environment to develop something like this.  Below you can see just how easy it is to create a slick and responsive web app.

The first step is designing a file called “ui.R” this is the user interface.  It consists of two primary code chunks the first is the sidebarLayout this describes the part of the page that the user will interact with.  The Shiny package has many implemented widgets built into the package and in this example I use two of these one is sliderInput and the other is the actionButton.  SliderInput creates a sliding widget and lets you set the min and max value for the variable as well as an initial starting value.  The actionButton widget creates a variable that initially has a value of 1 and this is incremented by 1 each time the button is pressed.  I use the actionButton to act as the seed for the stochastic portion of my code.  The second part of this simple ui.R file specifies the output to place next to the sidebar.  In this example I simply call the object “distPlot” which is created in the server page and I also specify the vertical size to be larger than it would be by default.

The next component of the Shiny app is the “server.R” file.  This is the code that will be run to generate the outpot portion of the webapp.  This consists of two components in the example.  The first is the reactive component that is the result of the Brownian motion simulation. By using replicate we will create a matrix where the rows represent generations and the columns represent iterations. By placing it inside of a reactive command we insure that it will only be rerun when one of the underlying input variables changes.  If you look at the code you can see that this includes: input$seed.val, input$reps, input$gens.  In contrast if another variable such as input$mon changes “x” will not be recalculated.

The second element of the code is the actual plot I want output.  Because x is a matrix we can use matplot to plot the results quickly and easily.  I choose to use a new color pallete viridis.  However, we want to make this interactive by adding a histrogram of any time point on the graph.  We do this by using the argument input$mon to select a specific row of x to plot.  This determines what data is used for the second graph and adds a vertical line indicating what time point is being plotted.

Here is what the finished product looks like:

05 September 2015

Ternary Diagram - Working Example

A couple of years ago I posted a bit of code that lets you plot data in a ternary diagram.  This is a useful way to display proportion data when proportions are split between three possible states.  Unfortunately, my original post included a link to the underlying data in my dropbox account.  This has led to me accidentally deleting it a couple of times. 

Well I did it again.  I decided that the best solution would be to just post some new code that included simulation of the data.

The original data was generated under a phylogenetic model and the data represented the number of species in each of 3 possible states.  If you would like to regenerate data for that approach you will need an R package that allows for discrete data simulation on a phylogenetic tree.  A fairly easy to install package that can do this is geiger.  Below is the code for generating the an appropriate dataset.

If you are not interested in phylogenetics you can generate an appropriate dataset in a non-phylogenetic framework like this:

Finally with either of these datasets you can then plot using the vcd package. 

This script will then produce: