Using R to select data based on another dataset -
i have large datasets (d1)like below:
snp position chromosome rs1 10010 1 rs2 10020 1 rs3 10030 1 rs4 10040 1 rs5 10010 2 rs6 10020 2 rs7 10030 2 rs8 10040 2 rs9 10010 3 rs10 10020 3 rs11 10030 3 rs12 10040 3
i have dataset(d2) below:
snp position chromosome rsa 10015 1 rsb 10035 3
now, want select range of snps in d1 based on d2(position+-5 , same chromosome), , write results txt file, results should this:
snp(d2) snp(d1) position(d1) chromosome rsa rs2 10020 1 rsa rs3 10030 1 rsb rs11 10030 3 rsb rs12 10040 3
i new in r, can please tell me how in r? kind of reply highly appreciated.
d2$low <- d2$position-5 ; d2$high<- d2$position+5
you might think somehting :
d2$matched <- which(d1$position >=d2$low & d2$high >= d1$position)
.... not really, need bit more involved.:
d1$matched <- apply(d1, 1, function(p) which(p['position'] >=d2[,'low'] & d2[,'high'] >= p['position'] & p['chromosome']==d2[,"chromosome"]) )
basically checking on each row of d1 whether there potential match in d2 in range , on hte same chromosome:
d1 # take # bind matching cases cbind( d1[ which(d1$matched > 0), ], d2[ unlist(d1$matched[which(d1$matched>0)]), ] ) #-------------------- snp position chromosome matched snp position chromosome low high 1 rs1 10010 1 1 rsa 10015 1 10010 10020 2 rs2 10020 1 1 rsa 10015 1 10010 10020 11 rs11 10030 3 2 rsb 10035 3 10030 10040 12 rs12 10040 3 2 rsb 10035 3 10030 10040
Comments
Post a Comment