There is a tool for every job, and when it comes to searching text - regex is the king. As a developer, dealing with text day in and day out, learning regex is one of the best things I did to up my productivity.

  1. it crosses over programming languages. Making this a platform and language independent investment. Regex in java,javascript,ruby or dotnet is... the same
  2. find and replace becomes immensely more powerful
  3. Regex abides by the 80/20 principle. Learning only the first 20% is enough to get you through 80 % of the problems you will face.

I've set up a demo page where you can practice some quick regex queries to go along with the tutorial.  But firing up any text editor or IDE and hitting ctrl-f will allow you to achieve the same.

Getting Started - The [box]

The simplest regex query would be
bank
Which searches for... yup - bank. Lets modify it a little and say we want to find both bank and tank in a block of text.
[bt]ank  # matches bank or tank

The [box] here represents 1 character. So regex is still searching for just 4 characters, but the first character is now either b or t. Any character within the [] is in an OR relationship. The order is irrelevant [tb]ank is semantically identical.

Below are a few more samples using the [box]

[abc]1              # matches a1, b1 or c1
[cba]1              # matches a1, b1 or c1
file[0123456789]    # matches file0,file1,file2 ... or file9
file[0-9]           # matches file0,file1,file2 ... or file9
[a-z]               # matches a or b or c or ... z

As you may have noticed in the example above, we introduced a new operator.  by using the '-' character we define a range.  This can come in handy to avoid typing absurdly long [abcdefghijklmnopqrstuvwyz] type constructs. Its just a shorthand.

Moving on - Quantifiers

Let's go further with our initial example. Say we want to find tank, bank, tanks, and banks. That would look like this:

[bt]anks?

We've added a quantifier operator, the '?' which acts on the character directly to its left. It means 0 or 1 times, so in essence we are saying - look for [bt]ank with either an s on the end or not.

A quantifier can be added to any character or even the [box]. Once again examples speak louder than words, so take a look at the samples below:

[bt]anks         # matches banks or tanks
[bt]anks?        # matches bank, tank, banks or tanks
[bt]?ank         # matches bank, tank or ank
ab?c?            # matches a, ab, abc or ac

To quickly summarize. When we define a character without quantifier we are stating we want it exactly once. When we add a quantifier we change how many times we want it to appear. Below are the other possible quantifiers with matching examples

quantifier behavior example regex example matches
? zero or once abc? ab, abc
* zero or more abc* ab,abc,abcc,abccc,abcccc,... etc
+ once or more abc+ abc,abcc,abccc,abcccc,...etc
{n} exactly n times abc{2} abcc
{n,m} n to m times abc{2,3} abcc,abccc

This gives us some basic syntax to play around with. and I would recommend to try it out in an ide or the demo page.

Part 2 of this tutorial is now up

blog comments powered by Disqus