regex - I cannot seem to get my regular expressions in Perl to recognize underscore (_) characters -
sorry bring problem simple, trying write series of regular expressions in perl extract types of data file. reason, cannot seem perl match lines of data have underscore (_
) in them.
if want lines start
"ch2 flybase exon "
or
"ch3 flybase exon "
(the white spaces tab characters), following code works well:
if ($_ =~ m/^ch[ 2-3] flybase exon /) {print outputfile;}
however, if want match lines more complex chromosome names (i.e. more letters 'ch' followed number), such as:
ch4_group1 ch4_group2 ch4_group3 ch4_group4 ch4_group5 chxl_group1a chxl_group1e chxl_group3a chxl_group3b chxr_group3a chxr_group5 chxr_group6 chxr_group8 unknown_group_1 unknown_group_10 unknown_group_100 unknown_group_101
i have tried following codes without success:
if ($_ =~ m/^ch4_group[1-5] flybase exon /) {print outputfile;} if ($_ =~ m/^chx._group[0-9]+[a-z]* flybase exon /) {print outputfile;} if ($_ =~ m/^unknown_group_[0-9]+ flybase exon /) {print outputfile;} if ($_ =~ m/^unknown_singleton_[0-9]+ flybase exon /) {print outputfile;}
i have tried including \
in front of _
, did not help.
any suggestions appreciated.
assuming you're using x
, m
, i
options make following changes:
^ch4_group[1-5] flybase exon
be:
^ch4_group[1-5]\s*flybase\sexon\s*$
^chx._group[0-9]+[a-z]* flybase exon
be:
^chx._group[0-9]+[a-z]*\s+flybase\sexon\s*$
^unknown_group_[0-9]+ flybase exon
be:
^unknown_group_[0-9]+\s*flybase\sexon\s*$
^unknown_singleton_[0-9]+ flybase exon
would be: ^unknown_singleton_[0-9]+\s*flybase\sexon\s*$
Comments
Post a Comment