Using R to select data based on another dataset -


i have large datasets (d1)like below:

snp position    chromosome rs1 10010   1 rs2 10020   1 rs3 10030   1 rs4 10040   1 rs5 10010   2 rs6 10020   2 rs7 10030   2 rs8 10040   2 rs9 10010   3 rs10 10020  3 rs11 10030  3 rs12 10040  3 

i have dataset(d2) below:

snp position    chromosome rsa 10015   1     rsb 10035   3 

now, want select range of snps in d1 based on d2(position+-5 , same chromosome), , write results txt file, results should this:

snp(d2) snp(d1) position(d1)    chromosome rsa rs2 10020   1 rsa rs3 10030   1 rsb rs11    10030   3 rsb rs12    10040   3  

i new in r, can please tell me how in r? kind of reply highly appreciated.

d2$low <- d2$position-5 ; d2$high<- d2$position+5 

you might think somehting :

d2$matched <- which(d1$position >=d2$low & d2$high >= d1$position) 

.... not really, need bit more involved.:

 d1$matched <- apply(d1, 1, function(p)                              which(p['position'] >=d2[,'low'] &                                    d2[,'high'] >= p['position'] &                                    p['chromosome']==d2[,"chromosome"]) ) 

basically checking on each row of d1 whether there potential match in d2 in range , on hte same chromosome:

 d1        # take  # bind matching cases  cbind( d1[ which(d1$matched > 0), ],          d2[ unlist(d1$matched[which(d1$matched>0)]), ] ) #--------------------     snp position chromosome matched snp position chromosome   low  high 1   rs1    10010          1       1 rsa    10015          1 10010 10020 2   rs2    10020          1       1 rsa    10015          1 10010 10020 11 rs11    10030          3       2 rsb    10035          3 10030 10040 12 rs12    10040          3       2 rsb    10035          3 10030 10040 

Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

Function that returns a formatted array in VBA -