FAQ 6.1 How can I hope to use regular expressions without creating illegible and unmaintainable code?


This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
Posted On: Tuesday 27th of November 2012 11:51:36 PM Total Views:  351
View Complete with Replies

RELATED TOPICS OF Perl PROGRAMMING LANGUAGE




FAQ 6.1 How can I hope to use regular expressions without creating illegible and unmaintainable code?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 6.1: How can I hope to use regular expressions without creating illegible and unmaintainable code Three techniques can make regular expressions maintainable and understandable. Comments Outside the Regex Describe what you're doing and how you're doing it, using normal Perl comments. # turn the line into the first word, a colon, and the # number of characters on the rest of the line s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg; Comments Inside the Regex The "/x" modifier causes whitespace to be ignored in a regex pattern (except in a character class), and also allows you to use normal comments there, too. As you can imagine, whitespace and comments help a lot. "/x" lets you turn this: s{'"]*|".*"|'.*')+>}{}gs; into this: s{ < # opening angle bracket (: # Non-backreffing grouping paren [^>'"] * # 0 or more things that are neither > nor ' nor " | # or else ".*" # a section between double quotes (stingy match) | # or else '.*' # a section between single quotes (stingy match) ) + # all occurring one or more times > # closing angle bracket }{}gsx; # replace with nothing, i.e. delete It's still not quite so clear as prose, but it is very useful for describing the meaning of each part of the pattern. Different Delimiters While we normally think of patterns as being delimited with "/" characters, they can be delimited by almost any character. perlre describes this. For example, the "s///" above uses braces as delimiters. Selecting another delimiter can avoid quoting the delimiter within the pattern: s/\/usr\/local/\/usr\/share/g; # bad delimiter choice s#/usr/local#/usr/share#g; # better -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod.
VIEWS ON THIS POST

130

Posted on:

Sunday 4th November 2012
View Replies!

FAQ 6.10 How do I use a regular expression to strip C style comments from a file?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 6.10: How do I use a regular expression to strip C style comments from a file While this actually can be done, it's much harder than you'd think. For example, this one-liner perl -0777 -pe 's{/\*.*\*/}{}gs' foo.c will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis. $/ = undef; $_ = ; s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 $2 : ""#gse; print; This could, of course, be more legibly written with the "/x" modifier, adding whitespace and comments. Here it is expanded, courtesy of Fred Curtis. s{ /\* ## Start of /* ... */ comment [^*]*\*+ ## Non-* followed by 1-or-more *'s ( [^/*][^*]*\*+ )* ## 0-or-more things which don't start with / ## but do end with '*' / ## End of /* ... */ comment | ## OR various things which aren't comments: ( " ## Start of " ... " string ( \\. ## Escaped char | ## OR [^"\\] ## Non "\ )* " ## End of " ... " string | ## OR ' ## Start of ' ... ' string ( \\. ## Escaped char | ## OR [^'\\] ## Non '\ )* ' ## End of ' ... ' string | ## OR . ## Anything other char [^/"'\\]* ## Chars which doesn't start a comment, string or escape ) }{defined $2 $2 : ""}gxse; A slight modification also removes C++ comments, as long as they are not spread over multiple lines using a continuation character): s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 $2 : ""#gse; -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod.
VIEWS ON THIS POST

159

Posted on:

Sunday 4th November 2012
View Replies!

FAQ 6.15 How can I do approximate matching?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 6.15: How can I do approximate matching See the module String::Approx available from CPAN. -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod.
VIEWS ON THIS POST

139

Posted on:

Sunday 4th November 2012
View Replies!

FAQ 6.17 Why don't word-boundary searches with "\b" work for me?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 6.17: Why don't word-boundary searches with "\b" work for me (contributed by brian d foy) Ensure that you know what \b really does: it's the boundary between a word character, \w, and something that isn't a word character. That thing that isn't a word character might be \W, but it can also be the start or end of the string. It's not (not!) the boundary between whitespace and non-whitespace, and it's not the stuff between words we use to create sentences. In regex speak, a word boundary (\b) is a "zero width assertion", meaning that it doesn't represent a character in the string, but a condition at a certain position. For the regular expression, /\bPerl\b/, there has to be a word boundary before the "P" and after the "l". As long as something other than a word character precedes the "P" and succeeds the "l", the pattern will match. These strings match /\bPerl\b/. "Perl" # no word char before P or after l "Perl " # same as previous (space is not a word char) "'Perl'" # the ' char is not a word char "Perl's" # no word char before P, non-word char after "l" These strings do not match /\bPerl\b/. "Perl_" # _ is a word char! "Perler" # no word char before P, but one after l You don't have to use \b to match words though. You can look for non-word characters surrounded by word characters. These strings match the pattern /\b'\b/. "don't" # the ' char is surrounded by "n" and "t" "qep'a'" # the ' char is surrounded by "p" and "a" These strings do not match /\b'\b/. "foo'" # there is no word char after non-word ' You can also use the complement of \b, \B, to specify that there should not be a word boundary. In the pattern /\Bam\B/, there must be a word character before the "a" and after the "m". These patterns match /\Bam\B/: "llama" # "am" surrounded by word chars "Samuel" # same These strings do not match /\Bam\B/ "Sam" # no word boundary before "a", but one after "m" "I am Sam" # "am" surrounded by non-word chars -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod.
VIEWS ON THIS POST

88

Posted on:

Monday 5th November 2012
View Replies!

finding & saving accented (unicode) chars with perl 5.6.1

Hartmut Camphausen wrote: > > Hint: You are using $wrd as a RE several times. For the sake of > efficiency you should compile it as a RE > > $wrd = qr/\/&.+;/ That will produce a syntax error because of the '&.+;/' after the compiled expression qr/\/ $ perl -le'$wrd = qr/\/&.+;/' syntax error at -e line 1, near "&." Search pattern not terminated or ternary operator parsed as search pattern at -e line 1. John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall
VIEWS ON THIS POST

169

Posted on:

Monday 5th November 2012
View Replies!

FAQ 6.13 How do I process each word on each line?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 6.13: How do I process each word on each line Use the split function: while () { foreach $word ( split ) { # do something with $word here } } Note that this isn't really a word in the English sense; it's just chunks of consecutive non-whitespace characters. To work with only alphanumeric sequences (including underscores), you might consider while () { foreach $word (m/(\w+)/g) { # do something with $word here } } -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod.
VIEWS ON THIS POST

194

Posted on:

Monday 5th November 2012
View Replies!

FAQ 6.19 What good is "\G" in a regular expression?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

99

Posted on:

Saturday 10th November 2012
View Replies!

FAQ 6.15 How can I do approximate matching?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

134

Posted on:

Sunday 11th November 2012
View Replies!

FAQ 6.13 How do I process each word on each line?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

81

Posted on:

Thursday 15th November 2012
View Replies!

FAQ 6.17 Why don't word-boundary searches with "\b" work for me?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

71

Posted on:

Thursday 15th November 2012
View Replies!

FAQ 6.18 Why does using $&, $`, or $' slow my program down?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

89

Posted on:

Monday 19th November 2012
View Replies!

Experts on embedding Perl in C wanted: Weird problem on RH7.3/Perl 5.6.1

I'm tearing my hair out on this one. I'm trying to embed a Perl interpreter into a C program. I need to be able to create and destroy the interpreter periodically, but will never actually have two interpreters at ...
VIEWS ON THIS POST

158

Posted on:

Monday 19th November 2012
View Replies!

FAQ 6.17 Why don't word-boundary searches with "\b" work for me?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

62

Posted on:

Sunday 25th November 2012
View Replies!

FAQ 6.12 What does it mean that regexes are greedy? How can I get around it?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

104

Posted on:

Monday 26th November 2012
View Replies!

FAQ 6.12 What does it mean that regexes are greedy? How can I get around it?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

172

Posted on:

Monday 26th November 2012
View Replies!

FAQ 6.19 What good is "\G" in a regular expression?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

93

Posted on:

Monday 26th November 2012
View Replies!

FAQ 6.13 How do I process each word on each line?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

95

Posted on:

Monday 26th November 2012
View Replies!

BerkeleyDB install errors (perl v5.6.1, MSWin32-x86-multi-thread) using PPM

Hi , Need some help. I am getting following error while installing module BerkeleyDB. --------------------------------------------------------------------------------------------- PPM interactive shell (2.2.0) - type 'help' for available commands. PPM> install BerkeleyDB Install package 'BerkeleyDB' (y/N): y Installing package 'BerkeleyDB'... Error installing package 'BerkeleyDB': ...
VIEWS ON THIS POST

157

Posted on:

Monday 26th November 2012
View Replies!

FAQ 6.10 How do I use a regular expression to strip C style comments from a file?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

537

Posted on:

Tuesday 27th November 2012
View Replies!

FAQ 6.15 How can I do approximate matching?

This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest ...
VIEWS ON THIS POST

414

Posted on:

Tuesday 27th November 2012
View Replies!