Sliding window analyses are a common approach when examining very large datasets such as genomic data.  A friend is working to find regions of the genome that may show signs of involvement in a complex trait, and part of this analysis required that he calculate the mean value of a measure for each site in the genome.  The figure below shows the goal of such a function.  Using a window size of 4 and a step size of 2:
There may be a ready made way to do this in R, but I was unable to quickly find it so I just went ahead and wrote one up.  First I made some data to work with:

data <- c="" max=".1),</font" min="0," runif="">
          runif(100000, min=.05, max=.1),
          runif(10000, min=.05, max=1), 
          runif(100000, min=0, max=.2))

then I wrote a simple function:

slideFunct <- data="" font="" function="" step="" window="">
  total <- data="" font="" length="">
  spots <- -="" 1="" by="step)</font" from="1," seq="" to="(total" window="">
  result <- length="length(spots))</font" vector="">
  for(i in 1:length(spots)){
    result[i] <- -="" 1="" data="" font="" i="" mean="" spots="" window="">
  }
  return(result)
}

and here is the result of running this function and plotting the result with window sizes of 2, 20, 200, and 400.  




Hopefully this will be helpful to someone.

cheers


10

View comments

  1. Shouldn't that be result[i] <- mean(data[spots[i]:(spots[i]+window-1)]) unless you want a window size of window + 1?

    ReplyDelete
    Replies
    1. Nice catch I agree with you! code updated!

      Delete
  2. I get the error 'wrong sign in "by" argument' after running your code with the test data and window size of 200 and step of 2, followed by plot(slideFunct)

    ReplyDelete
  3. Also, I believe you want to define spots as:
    spots <- seq(from = 1, to = (total - window + 1), by = step)
    since you want to include the last window too.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Thanks for posting this you are correct. The code has been updated.

      cheers

      Delete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. I'am glad to read the whole content of this topic and am very excited.Thank you.
    หนังตลกฝรั่ง

    ReplyDelete
  6. Can you provide an extension to this code such that you could take the mean over a two dimensional section of a two dimensional dataframe or matrix?

    ReplyDelete
Great Blogs
Great Blogs
About Me
About Me
My Photo
I am broadly interested in the application and development of comparative methods to better understand genome evolution at all scales from nucleotides to chromosomes.
Subscribe
Subscribe
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.