Regular Expression


Meta Characters


“ [ ] “

Specifies a Character class.

[aefg] à Matches any of the character a, e, f, g.

[a-f] à Matches any of the character in the range a to f.

[mkl$] à Matches any of the character including $. Meta characters inside character class is stripped off its special nature.

[^a-z] à Complementing; Matches all the characters other than Lower case alphabets.

“ \ “

Gives special meaning to various characters. Also used to escape all the meta characters from their special nature.

\d à matches any decimal digit; this is equivalent to the class [0-9].
\D à matches any non-digit character; this is equivalent to the class [ˆ0-9].
\s à Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].
\S à matches any non-whitespace character; this is equivalent to the class [ˆ\t\n\r\f\v].

\w à matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9].

\W à matches any non-alphanumeric character; this is equivalent to the class [ˆa-zA-Z0-9].[\s,!] à Backslash characters can be used inside a character class. This is a character
class that match any white space character followed by “,” and “!”.

\b à matches start of end of word boundary.

\B à opposite of \b.

\A à matches start of the string

\Z à matches end of the string

“ . “

This matches anything except new line.

“ * “

Pre-pended character can be matched zero or more number of times.

Ca*t à Matches Ct, Cat, Caat, Caaat, Caaaat and so on.

Note :
Repetitions such as d*c are greedy; when repeating a RE, the matching engine
will try to repeat it as many times as possible. If later portions of the pattern don’t
match, the matching engine will then back up and try again with few repetitions.

“ + “

Pre-pended character can be matched once or more number of times.

Ca*t à Matches Cat, Caat, Caaat, Caaaat and so on but not Ct.

“ ? “

Pre-pended character can be matched zero or once.

Ca*t à Matches only Ct or Cat.

“ {m,n} “

Pre-pended character can be matched minimum of m times but maximum of n
times. Either m or n can be omitted. m is assumed to be 0 if omitted. n is assumed to be infinity if omitted.

Ca{1,3}t à Matches Cat, Caat, Caaat but not Ct, Caaaat, Caaaat and so on.

Note : {0,} is same as *. {1,} is same as +. {,1} is same as ?

“ “

The or operator which can be used between two regular expressions.

Chandra Shekar à matches either Chandra or Shekar.

“ ^ “

Matches beginning of the line or beginning of a string depending upon mode.

“ $ “

Matches end of the line or end of a string depending upon mode.

“ ( ) “

Used to group characters so that repetition meta-characters can be used over group.

Ch(an)*dra à matches Chdra, Chandra, Chanandra, Chananandra… so on.



Comments

Popular posts from this blog

AWK - quick reference

GDB - A quick reference

Linux Kernel Hacking!!