Skip to main content

In a regular expression, which characters need escaping? [Resolved]

In general, which characters in a regular expression need escaping?

For example, the following is not syntactically correct:

echo '[]' | grep '[]'
grep: Unmatched [ or [^

This, however, is syntatically correct:

echo '[]' | grep '\[]'
[]

Is there any documentation on which characters should be escaped in a regular expression, and which should not?


Question Credit: LanceBaynes
Question Reference
Asked July 11, 2018
Posted Under: Unix Linux
13 views
5 Answers

This depends on the application. In your example [ must be quoted as an argument for grep but not echo.

For the shell (from the POSIX specs):

Quoting is used to remove the special meaning of certain characters or words to the shell. Quoting can be used to preserve the literal meaning of the special characters in the next paragraph, prevent reserved words from being recognized as such, and prevent parameter expansion and command substitution within here-document processing (see Here-Document).

The application shall quote the following characters if they are to represent themselves:

|  &  ;  <  >  (  )  $  `  \  "  '  <space>  <tab>  <newline>

and the following may need to be quoted under certain circumstances. That is, these characters may be special depending on conditions described elsewhere in this volume of IEEE Std 1003.1-2001:

*   ?   [   #   ˜   =   %

The various quoting mechanisms are the escape character, single-quotes, and double-quotes. The here-document represents another form of quoting; see Here-Document.

Specific programs (using regexes, perl, awk) could have additional requirements on escaping.


credit: Gilles
Answered July 11, 2018

Each application will have its own set of 'special' characters. The issue that you ran into was with grep not the shell. For which characters need to be quoted in grep, read the manpage's section on "REGULAR EXPRESSIONS".

For the shell, that characters that should be quoted are:

;'"`#$&*?[]<>{}\

and any whitespace.

Depending on the shell, other characters may need to be quoted as well:

!^%

Look under "SHELL GRAMMAR" on the shell's manpage.


credit: Arcege
Answered July 11, 2018

There are multiple types of regular expressions and the set of special characters depend on the particular type. Some of them are described below. In all the cases special characters are escaped by backslash \. E.g. to match [ you write \[ instead. Alternatively the characters (except ^) could be escaped by enclosing them between square brackets one by one like [[].

The characters which are special in some contexts like ^ special at the beginning of a (sub-)expression can be escaped in all contexts.

As others wrote: in shell if you do not enclose the expression between single quotes you have to additionally escape the special characters for the shell in the already escaped regex. Example: Instead of '\[' you can write \\[ (alternatively: "\[" or "\\[") in Bourne compatible shells like bash but this is another story.

Basic Regular Expressions (BRE)

  • POSIX: Basic Regular Expressions
  • Commands: grep, sed
  • Special characters: .[\
  • Special in some contexts: *^$
  • Escape a string: "$(printf '%s' "$string" | sed 's/[.[\*^$]/\\&/g')"

Extended Regular Expressions (ERE)

  • POSIX: Extended Regular Expressions
  • Commands: grep -E, GNU: sed -r, *BSD: sed -E
  • Special characters: .[\(
  • Special in some contexts: *^$)+?{|
  • Escape a string: "$(printf '%s' "$string" | sed 's/[.[\*^$()+?{|]/\\&/g')"

credit: pabouk
Answered July 11, 2018

grep uses BRE as its regex method. There is good documentation on it here, a general rundown would be "escape any special character or metacharacter to get its literal, escape to create escape sequences (\n, \r, etc)", although this is not always true, for example, you have to escape ( and ) to get their special meaning (backreference).


credit: Chris Down
Answered July 11, 2018
Your Answer
D:\Adnan\Candoerz\CandoProject\vQA