[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

C. Querying using regular expressions

GNATS uses GNU regular expression syntax with these settings:

 
RE_SYNTAX_POSIX_EXTENDED | RE_BK_PLUS_QM & RE_DOT_NEWLINE

This means that parentheses (`(' and `)') and pipe symbols (`|') do not need to be used with the escape symbol `\'. The tokens `+' and `?' do need the escape symbol, however.

Unfortunately, we do not have room in this manual for an adequate tutorial on regular expressions. The following is a basic summary of some regular expressions you might wish to use.

See section `Regular Expression Syntax' in Regex, for details on regular expression syntax. Also see section `Syntax of Regular Expressions' in GNU Emacs Manual, but beware that the syntax for regular expressions in Emacs is slightly different.

All search criteria options to query-pr rely on regular expression syntax to construct their search patterns. For example,

 
query-pr --state=open

matches all PRs whose `>State:' values match with the regular expression `open'.

We can substitute the expression `o' for `open', according to GNU regular expression syntax. This matches all values of `>State:' which begin with the letter `o'.

 
query-pr --state=o

is equivalent to

 
query-pr --state=open

in this case, since the only value for `>State:' which matches the expression `o' is `open'. (Double quotes (") are used to protect the asterix (*) from the shell.) `--state=o' also matches `o', `oswald', and even `oooooo', but none of those values are valid states for a Problem Report.

Regular expression syntax considers a regexp token surrounded with parentheses, as in `(regexp)', to be a group. This means that `(ab)*' matches any number of contiguous instances of `ab', including zero. Matches include `', `ab', and `ababab'.

Regular expression syntax considers a regexp token surrounded with square brackets, as in `[regexp]', to be a list. This means that `Char[(ley)(lene)(broiled)' matches any of the words `Charley', `Charlene', or `Charbroiled' (case is significant; `charbroiled' is not matched).

Using groups and lists, we see that

 
query-pr --category="gcc|gdb|gas"

is equivalent to

 
query-pr --category="g(cc|db|as)"

and is also very similar to

 
query-pr --category="g[cda]"

with the exception that this last search matches any values which begin with `gc', `gd', or `ga'.

The `.' character is known as a wildcard. `.' matches on any single character. `*' matches the previous character (except newlines), list, or group any number of times, including zero. Therefore, we can understand `.*' to mean "match zero or more instances of any character." For this reason, we never specify it at the end of a regular expression, as that would be redundant. The expression `o' matches any instance of the letter `o' (followed by anything) at the beginning of a line, while the expression `o.*' matches any instance of the letter `o' at the beginning of a line followed by any number (including zero) of any characters.

We can also use the expression operator `|' to signify a logical OR, such that

 
query-pr --state="o|a"

matches all `open' or `analyzed' Problem Reports. (Double quotes (") are used to protect the pipe symbol (|) from the shell.)

By the same token,(5) using

 
query-pr --state=".*a"

matches all values for `>State:' which contain an `a'. (These include `analyzed' and `feedback'.)

Another way to understand what wildcards do is to follow them on their search for matching text. By our syntax, `.*' matches any character any number of times, including zero. Therefore, `.*a' searches for any group of characters which end with `a', ignoring the rest of the field. `.*a' matches `analyzed' (stopping at the first `a') as well as `feedback'.

Note: When using `--text' or `--multitext', you do not have to specify the token `.*' at the beginning of text to match the entire field. For the technically minded, this is because `--text' and `--multitext' use `re_search' rather than `re_match'. `re_match' anchors the search at the beginning of the field, while `re_search' does not anchor the search.

For example, to search in the >Description: field for the text

 
The defrobulator component returns a nil value.

we can use

 
query-pr --multitext="defrobulator.*nil"

To also match newlines, we have to include the expression `(.|^M)' instead of just a dot (`.'). `(.|^M)' matches "any single character except a newline (`.') or (`|') any newline (`^M')." This means that to search for the text

 
The defrobulator component enters the bifrabulator routine
and returns a nil value.

we must use

 
query-pr --multitext="defrobulator(.|^M)*nil"

To generate the newline character `^M', type the following depending on your shell:

csh
`control-V control-M'

tcsh
`control-V control-J'

sh (or bash)
Use the RETURN key, as in

 
(.|
)

Again, see section `Regular Expression Syntax' in Regex, for a much more complete discussion on regular expression syntax.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by XEmacs Webmaster on October, 2 2007 using texi2html