Regular Expressions

Regular Expressions provide a means for identifying strings of text such as particular characters, words or patterns of characters.  

With Model Review some of the more common uses for regular expressions are for adherence to naming conventions, or for sorting through annotation within the project model for very specific information.

Special Characters

Regular Expressions employ special characters to provide more flexibility in defining matches. The "special characters" are:

+ * ? . [ ] ^ ( ) | \

The following sections describe how to use each of the special characters:

Period

A period (".") will match any one character.

Expression

Meaning

Matches

Does Not Match

390-.

Match the string "390-" followed by any character.

390-A, 390-1, 390--

390-A1, 1390-1

Revision . Released

Match the string "Revision " followed by any character and then the string " Released"

Revision A Released

Revision 1 Released

Revision # Released

Revision A1 Released

RevisionAReleased

Square Brackets

Square Brackets ("[ ]") define a character class, which matches any single character against the characters inside the brackets. Inside of the bracket, all of the special characters lose their meaning, except for "^", which when used as the first character in brackets means NOT matching the specified characters.

Also, ranges can be used inside of the square brackets.

Expression

Meaning

Matches

Does Not Match

[akm]

One character: either a, k, or m.

a, k, m

Akm, ak, G

[a-z]

Any letter

A, b, c, d

1, 2, -, #

[^akm]

One character as long as it is NOT a, k, or m.

C, f, G,

Am (because it is two characters)

A, k, m

[0-9]

Any number

0, 4, 7

A, #, z

[a-z][a-z]

Any two letters

AB, BC, DE

A (only one letter)

A1

12

Asterisk

An Asterisk ("*") follows an expression and indicates that the preceding expression can occur zero or more times.

Expression

Meaning

Matches

Does Not Match

Ab*c

"A" followed by zero or more b's, with a C on the end.

Ac

Abc

Abbbbbbbc

Bbb

Abcd

[a-z]*

Any number of any letters (equivalent to say, only letters &endash; but includes zero letters)

A

Bob

AAAAA

Steel

<Blank> (because * can indicate zero occurrences)

STEEL230

12

AA-##

Plus Sign

A Plus Sign ("+") follows an expression and indicates that the preceding expression can occur one or more times.

Expression

Meaning

Matches

Does Not Match

Ab+c

"A" followed by one or more b's, with a C on the end.

Abc

Abbbbbbbc

Ac

Bbb

Abcd

[a-z]+

Any number of any letters (equivalent to say, only letters)

Bob

AAAAA

Steel

STEEL230

12

AA-##

<blank>

Question Mark

A Question Mark ("?") follows an expression to indicate that the preceding expression was optional.

Expression

Meaning

Matches

Does Not Match

Ab?c

"A" followed by an optional "b" with a C on the end.

Ac

Abc

Abbc

Abcd

390-[a-z][a-z]?

"390-" followed by a letter, and a second optional letter.

390-A

390-AB

390-11

390-

390-ABC

Pipe

The pipe ("|") character operates as an OR between two expressions (usually enclosed in parentheses).

Expression

Meaning

Matches

Does Not Match

(390|241)-[a-z]+

Either a "390" or "241" followed by a "-" and one or more letters.

390-A

241-A

241-AB

200-A

241

241-

As per (MS2377|CS123)

"As per " followed by either "MS2377" or "CS123"

As per MS2377

As per CS123

As per

As per MS3222

 

390-([abc]|[123])

"390-" followed by an "a", "b" or "c" OR a "1", "2", or "3".

390-A

390-3

390-F

390-

Backslash for Covering Special Characters

If it is necessary to actually match against a character that is a "special character," a backslash in front of the special character tells Model Review that the character should be taken literally (rather than as the special character).

Expression

Meaning

Matches

Does Not Match

[0-9]\+

A number followed by a "+".

1+

2+

1

A

1+1

What\?

"What" followed by a question mark.

What?

What's Up?

Starts With / Ends With

A common task in Model Review is to "start with" or "end with" a particular value. This syntax is one difference for people used to "search" style regular expressions. The recommended approach is to use ".*" or ".+" on the front or back end of the expression to indicate Starts With or Ends With.

Expression

Meaning

Matches

Does Not Match

390-.*

Starts with "390-" (ends with anything &endash; including blank).

390-1

390-111

1390-1

390-.+

Starts with "390-" (ends with anything, but must be at least one char).

390-1

390-111

1390-1

390-

.*-[a-z]

Ends with a "-" and a letter (starts with anything &endash; including blank).

Revision-A

Rev-A

-A

Revision-A1

Rev-1

123

-1

.+-[a-z]

Ends with a "-" and a letter (starts with anything, but must be at least one char).

Revision-A

Rev-A

Revsion-A1

Rev-1

123

-1

-A

Putting It All Together

Regular Expressions are a powerful (but somewhat complex) approach to matching text. In order to address more complex requirements, you may need to become proficient putting together multiple expressions to make a complex expression.

Some examples of complex expressions would be:

Expression

Meaning

Matches

Does Not Match

[0-9]+[-]?[0-9]+

Numbers with an optional dash in the middle.

123-45

12345

12A32

1232-A

A

.*[^_]

Cannot end with an underscore (_).

123324

PART1

12343_

(390|231)-[a-z0-9]+-[0-9]+

Either 390 or 231, followed by a "-", an alphanumeric section of at least one character, then a "-" and at least one number.

390-mypart-1

231-bracket-99

120-mypart-1

380- -