This article is a continuation of Regex Primer - Part 1. If you haven't read that yet, it might be prudent to do so first. You can test the queries as you go here
Ok! We've been introduced to []. One important feature we haven't covered is negation. If you want to say something like any character but a.
[^a] # matches b,c,d,e,f,\n .... anything but a
This applies to all the characters within the box when the ^ is present. You can't pick and choose.
[^0123456789] # matches anything but a numeric value
We now have adequate knowledge under our belt to start with some real life examples. One of the easiest places to use regex is String validation. For example lets validate a telephone number in the form 555-12345678. Typically you could start splitting the string and then casting both halves to numbers, but we know regex now, so lets do it quickly
555-[0-9]{7}
Done. We are saying match 555 followed by a dash followed by exactly 7 digits.
But we can go even shorter!
One thing you'll notice, is that after a while you start using the ranges like [0-9] and [a-z] an awful lot. They are very common and pop up all the time. As such there exist various shorthands for these commonly used ranges. Like, in this case, \d which stands for digit and is semantically identical to [0-9]
555-\d{7} # this is exactly the same as 555-[0-9]{7}
Here are the other shorthands:
note: You don't have to know these. You can achieve the exact same effect by just typing it out using the box form. They are there for convenience.
| shorthand | meaning | semantically identical to |
|---|---|---|
| \d | digit | [0-9] |
| \w | word | [a-zA-Z0-9_] Note the underscore is also included |
| \s | space, tab or newline | [ \t\r\n] |
| \D | anything but a digit | ^\d |
| \W | anything but a word | ^\w |
| \S | anything but a space | ^\s |
TIP: notice they each have an exact opposite by simply using the uppercase form. You only really need to remember the first 3.
The dot is a special shorthand. I am providing it mainly because if you are reading someone else's code you will see it popup. It basically says match everything except the new line. The problem is that what constitutes a new line isn't always consistent on different platforms.
.* # match it all (try it on the demo page and see its power...)
.* # same as [^\n]
The dot is misused quite a bit, so generally I would recommend being specific and using a combination of the other shorthands and the box
Sometimes you want to actually find a dot, or the [ character. Since these and others have special meaning and are part of the regex syntax you have to escape them when you want to search specifically for these characters. for example
\. # actually looks for just a DOT instead of everything
Ok, lets get back to validation. This time an email adress. We'll keep it somewhat simpler than the real specs and provide the following rules an email address has to live up to in our system
[a-z][\w-]*@[a-z]+\.[a-z]+
Done? Did you remember to escape the dot? Another gotcha lies in the use of the +. In the second part we want to match 1 or more, not zero or more.
Now lets say that we want to change our rules and match only the big valid domains. So our email address has to end in either com or net. We can't do this with the syntax we've learned up till now. The regex query will make use of 2 new concepts, the OR operator and groups.
[a-z][\w-]*@[a-z]+\.(com|net)
Lets break that down by example
com|net # match com or net
a|b|c # same as [abc].
The group is needed to indicate that we don't want to use the entire left side. if we wanted to match 'Brad Pitt' or 'Angelina Pitt' only
Brad|Angelina Pitt # matches either 'Brad' or 'Angelina Pitt'
#Not What We wanted!
(Brad|Angelina) Pitt # now we got it
The principle of grouping using parenthesis should be relatively straightforward as a developer. In fact we can start combining it with some of the other operators we already know.
(dog)+ # match dog,dogdog,dogdogdog ...
java(bean)? # match java or javabean
The last part of this tutorial will cover Greediness, word boundaries, find and replace and anchors with a few examples to put everything together. As always I highly recommend playing with it to get it in your fingers. regex tester.
blog comments powered by Disqus