SlightlyLoony
Tera Contributor

find_real_file.pngA common use for regular expressions is to extract some text from a larger piece of text, based on some delimiters that define the range of text to pull out. For instance, I might want to extract the text surrounded by parentheses, as in this sentence:
This is a test to see if (maybe) we can extract the text within some (pairs of) parentheses.
This is easy to do with regular expressions, but it involves a trick or two. Do you know how to do it?

Here's how you might try doing this (especially if you've never run into this particular challenge before):


var text = 'This is a test to see if (maybe) we can extract the text within some (pairs of) parentheses.';

var regex = /\((.*)\)/g;

var match;
while (match = regex.exec(text)) {
gs.log(match[1]);
}

The regex variable contains the search for parentheses with anything at all between them. The "\(" and "\)" specify that we're looking for parentheses (we have to escape them with a backslash because parentheses are special characters for regular expressions). The other parentheses define our capture group, wherein we hope to capture what's between the parentheses in our input string. The "g" at the end means it's a global regular expression — it's going to repeat the search each time we call exec() on the regex variable. So we run it, we might expect to see this output:

maybe
pairs of

But instead we get this:

maybe) we can extract the text within some (pairs of

What's going on here?

The answer lies in the ".*" inside our capture parentheses. The dot means "match any character" and the asterisk means "match any number" (from 0 to infinity). Ok, you say, that's exactly what I meant! But what you might not know is that the asterisk is a greedy quantifier — it's going to match as many characters as possible. So what's happening is that it matches all the characters between the first "(" and the last ")". It's a greedy little bugger.

Fortunately regular expressions include the opposite of greedy quantifiers: reluctant quantifiers. These do exactly what we need in this circumstance: they will match the least number of characters that it can in order to make the match. To turn a greedy quantifier into a reluctant quantifier is easy: just append a "?", as below:

var text = 'This is a test to see if (maybe) we can extract the text within some (pairs of) parentheses.';

var regex = /\((.*?)\)/g;

var match;
while (match = regex.exec(text)) {
gs.log(match[1]);
}

If you run that code, you'll get exactly what you wanted...