Technically today is a holiday in the US... At least I've yet to see anyone else make it into the office.  Therefor, it seems like a good time for a couple of less serious R functions that I wrote just for fun.


wordsTOcodons
enigma machine

This simple function takes a single argument of a text string containing any of the letters A to Z and spaces.  It treats this as a protein sequence represented with single letter IUPAC amino acid codes.  It converts this to the DNA sequence that would code for this protein.  This function makes use of a couple of codons that are normally stop codons but in certain organisms have been coopted to code for unusual amino acids.  This is how we are able to expand the alphabet to include O and U.  We further expand the alphabet to include B, J, X, and Z by utilizing ambiguity abbreviations from IUPAC

U is the abbreviation for selenocysteine it is encoded by a TGA codon, which is normally a stop codon.
O is the abbreviation for pyrrolysine it is encoded by a TAG codon, also normally a stop codon.
B is the abbreviation for aspartic acid (GAT) or asparagine (AAT) so we represent this with RAT since R is the ambiguity code for A or G
J is the abbreviation for leucine (CTT) or isoleucine (ATT) so we represent this with  MTT since M is the ambiguity code for C or A.
X is the abbreviation for any amino acid so we represent it with NNN
Z is the abbreviation for glutamic acid (GAA) or glutamine (CAA) so we represent this with SAA since S is the ambiguity code for C or G


wordTOcodons <- function(x){
  table <- matrix(c('A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I',
                    'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'O', 'U',
                    'B', 'J', 'X', 'Z', ' ', 
                    'GCT', 'CGC', 'AAT', 'GAT', 'TGT', 'CAG', 
                    'GAA', 'GGT', 'CAT', 'ATT', 'TTA', 'AAG', 'ATG', 'TTC', 
                    'CCA', 'TCA', 'ACT', 'TGG', 'TAC', 'GTT', 'TAG', 'TGA', 
                    'RAT', 'MTT', 'NNN', 'SAA', ' '),27,2)
  x <- toupper(x)
  x <- strsplit(x, "")[[1]]
  result <- list()
  for(i in 1:length(x)){
    result[[i]] <- table[table[,1] == x[i] , 2]
    if(!x[i] %in% table[,1]){
      result[[i]] <- '-'
    }
  }
  return(paste(unlist(result), sep="", collapse=" "))
}


So we can take some simple text and quickly convert it:

> wordTOcodons("Heath Blackmon")
[1] "CAT GAA GCT ACT CAT RAT TTA GCT TGT AAG ATG TAG AAT"



Dawkins Monkeys
typing monkeys


Inspired by Dawkins application of the infinite monkey theorem to evolution in his book the Blind Watchmaker these two function recapitulate his two examples 1) random mutation no inheritance 2) random mutation natural selection with inheritance.

This is evolution by random mutation


RandomMonkey <- function(target){
  target <- toupper(target)
  i <- 0
  ptm <- proc.time()
  target <- strsplit(target, split="")[[1]]
  mutations <- c(toupper(letters)," ", ".",",",";")
  result <- sample(mutations, size=length(target), replace=T)
  check <- F
  while(check == F){
    result <- sample(mutations, size=length(target), replace=T)
    if(sum(result == target) == length(target)) check <- T
    mx <- 40
    if(length(result)<40) mx <- length(result)
    cat(paste(paste(result[1:mx], collapse=""), "\n"))
    i <- i + 1
  }
  x<-proc.time() - ptm
  cat(paste(round(x[1], digits=4), "time elapsed. Perfect match required", i, "attempts"))
}

Then we have evolution by natural selection


SelectedMonkey <- function(target){
  target <- toupper(target)
  i <- 0
  ptm <- proc.time()
  target <- strsplit(target, split="")[[1]]
  mutations <- c(toupper(letters)," ", ".",",",";")
  result <- sample(mutations, size=length(target), replace=T)
  check <- F
  while(check == F){
    result[result!=target] <- sample(mutations, size = sum(result!=target), replace=T)
    if(sum(result == target) == length(target)) check <- T
    mx <- 40
    if(length(result)<40) mx <- length(result)
    cat(paste(paste(result[1:mx], collapse=""), "\n"))
    i <- i + 1
  }
  x<-proc.time() - ptm
  cat(paste(round(x[1], digits=4), "time elapsed. Perfect match required", i, "attempts"))
}


Now we can run these two and see just how efficient selection really is:



First we try "evolving" my sons name (Leslie) by just randomly picking letters the correct number of letters.  If we fail in any way we throw it out and try again.

RandomMonkey("leslie")
1877.117 time elapsed. Perfect match required 306,976,435 attempts


Next we still allow the letters in the name to mutate at random but if we fail to create my sons name this time we keep the letters that are good matches and try mutating the rest.


SelectedMonkey("leslie")
0.004 time elapsed. Perfect match required 84 attempts

The randomMonkey function simply won't complete running on many machines in a reasonable time if the target is more than 7 letters.  I choose to ignore capitalization but i do include spaces, commas, semicolons and periods.  This means the possible combinations is equal to 30 raised to the X power where X is the length of the target.

combinations

To forestall comments I should say that I am aware that Dawkins was not the first to apply this infinite monkey theorem to thinking about evolution. Likely one of the Huxley's did much earlier. However, it was Richard's discussion that made me want to write my own little snippet of code.







0

Add a comment

Great Blogs
Great Blogs
About Me
About Me
My Photo
I am broadly interested in the application and development of comparative methods to better understand genome evolution at all scales from nucleotides to chromosomes.
Subscribe
Subscribe
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.