Sliding window analyses are a common approach when examining
very large datasets such as genomic data.
A friend is working to find regions of the genome that may show signs of
involvement in a complex trait, and part of this analysis required that he calculate
the mean value of a measure for each site in the genome. The figure below shows the goal of such a
function. Using a window size of 4 and a
step size of 2:
There may be a ready made way to do this in R, but I was unable to quickly find it so I just went ahead and wrote one up. First I made some data to work with:
data <- c="" max=".1),</font" min="0," runif="">
runif(100000, min=.05, max=.1),
runif(10000, min=.05, max=1),
runif(100000, min=0, max=.2))
then I wrote a simple function:
slideFunct <- data="" font="" function="" step="" window="">
total <- data="" font="" length="">
spots <- -="" 1="" by="step)</font" from="1," seq="" to="(total" window="">
result <- length="length(spots))</font" vector="">
for(i in 1:length(spots)){
result[i] <- -="" 1="" data="" font="" i="" mean="" spots="" window="">
}
return(result)
}
and here is the result of running this function and plotting the result with window sizes of 2, 20, 200, and 400.
Hopefully this will be helpful to someone.
cheers
Shouldn't that be result[i] <- mean(data[spots[i]:(spots[i]+window-1)]) unless you want a window size of window + 1?
ReplyDeleteNice catch I agree with you! code updated!
DeleteI get the error 'wrong sign in "by" argument' after running your code with the test data and window size of 200 and step of 2, followed by plot(slideFunct)
ReplyDeleteAlso, I believe you want to define spots as:
ReplyDeletespots <- seq(from = 1, to = (total - window + 1), by = step)
since you want to include the last window too.
This comment has been removed by the author.
DeleteThanks for posting this you are correct. The code has been updated.
Deletecheers
This comment has been removed by a blog administrator.
ReplyDeleteI'am glad to read the whole content of this topic and am very excited.Thank you.
ReplyDeleteหนังตลกฝรั่ง
Can you provide an extension to this code such that you could take the mean over a two dimensional section of a two dimensional dataframe or matrix?
ReplyDeleteGood post! We are linking to this particularly great post on our websites.
ReplyDeleteeducatorpage
Bloglovin
gumroad
toparticle
door site
Zenwriting