agillo.net | about me

Regex Primer : Part 1

There is a tool for every job, and when it comes to searching text - regex is the king. As a developer, dealing with text day in and day out, learning regex is one of the best things I did to up my productivity.

  1. it crosses over programming languages. Making this a platform and language independent investment. Regex in java,javascript,ruby or dotnet is... the same
  2. find and replace becomes immensely more powerful
  3. Regex abides by the 80/20 principle. Learning only the first 20% is enough to get you through 80 % of the problems you will face.

You can use the regex demo page to try some quick regex queries as you go along with the tutorial. But firing up any text editor or IDE and hitting ctrl-f will allow you to achieve the same.

Getting Started - The [box]

The simplest regex query would be

bank

Which searches for... yup - bank. Lets modify it a little and say we want to find both bank and tank in a block of text.

[bt]ank  # matches bank or tank

The [box] here represents 1 character. So regex is still searching for just 4 characters, but the first character is now either b or t. Any character within the [] is in an OR relationship. The order is irrelevant [tb]ank is semantically identical.

Below are a few more samples using the [box]

[abc]1              # matches a1, b1 or c1
[cba]1              # matches a1, b1 or c1
file[0123456789]    # matches file0,file1,file2 ... or file9
file[0-9]           # matches file0,file1,file2 ... or file9
[a-z]               # matches a or b or c or ... z

As you may have noticed in the example above, we introduced a new operator.  by using the '-' character we define a range.  This can come in handy to avoid typing absurdly long [abcdefghijklmnopqrstuvwyz] type constructs. Its just a shorthand.

Moving on - Quantifiers

Let's go further with our initial example. Say we want to find tank, bank, tanks, and banks. That would look like this:

[bt]anks?

We've added a quantifier operator, the '?' which acts on the character directly to its left. It means 0 or 1 times, so in essence we are saying - look for [bt]ank with either an s on the end or not.

A quantifier can be added to any character or even the [box]. Once again examples speak louder than words, so take a look at the samples below:

[bt]anks         # matches banks or tanks

[bt]anks?        # matches bank, tank, banks or tanks

[bt]?ank         # matches bank, tank or ank

ab?c?            # matches a, ab, abc or ac

To quickly summarize. When we define a character without quantifier we are stating we want it exactly once. When we add a quantifier we change how many times we want it to appear. Below are the other possible quantifiers with matching examples

quantifier behavior example regex example matches
? zero or once abc? ab, abc
* zero or more abc* ab,abc,abcc,abccc,abcccc,... etc
+ once or more abc+ abc,abcc,abccc,abcccc,...etc
{n} exactly n times abc{2} abcc
{n,m} n to m times abc{2,3} abcc,abccc

This gives us some basic syntax to play around with. and I would recommend to try it out in an ide or the demo page.

Part 2 of this tutorial is now up

blog comments powered by Disqus