SlightlyLoony
Tera Contributor

find_real_file.pngThe young lady at right is quite pleased with herself. Why, you ask? It's because she just discovered that JavaScript (by far her favorite programming language) lets you replace matches of a regular expression with the results of a function that returns a string — and the function has full access to the string, the matches, and any capture groups in the match.

She's a precocious young lady, isn't she? Imagine knowing what a capture group was at her age!

This is one of those things that's best explained by example. Let's suppose you wanted your program to format a column of numbers you've received in an array. You want the result to be right-justified, and you want commas to separate each group of three digits from the right. How might you write a program to do this, and why on earth would a replacement function be of use in doing so?

Here's how we implemented it:


// call our formatter with an array of arbitrary numbers...
formatter( [5,375,4982,2347023,47632,6128,724592] );

// format the given array of numbers into a column of right-justified (in a field 12 characters wide), comma-grouped numbers...
function formatter(nums) {
var text = nums.join('\n');
var ans = text.replace(/(\d+)/g, nfRep); // MAGIC: the replacement function nfRep() in use...
gs.log('Unformatted:\n' + text);
gs.log('Formatted:\n' + ans);
}

// the replacement function
// the second parameters is all we care about; it contains the first capture group when we're invoked
// that has the number string we need to format...
function nfRep(match, num) {
// first we put our commas in...
var x = num;
var same = false;
while (!same) {
var x1 = x.length;
x = x.replace(/(\d)(\d\d\d)(?=(,|$))/, '$1,$2');
same = (x1 == x.length);
}
// now we right-justify it, and we're done...
x = ' ' + x;
return x.substr(x.length - 12);
}

Please don't run screaming from the room! It's not all that bad!

The first thing this code does is to invoke the formatter() function with the array of numbers we want to format. The function turns the array into a string with a number on each line, then does the magic part by replacing each number with the results of the nfRep() function — the replacement function. The rest just prints out the before and after strings so you can see what they look like.

Replacement functions are called once for each match of the regular expression. In this case, our regular expression /(\d+)/g matches strings of one or more digits. It's global, so it will match every occurrence of a string of digits. In our example we have 7 numbers in the array, so this regular expression will match 7 times, once for each number. That means the replacement function will also be called 7 times, once for each number.

When replacement functions are called, the number and meaning of the arguments passed in depends on the regular expression.

The first argument is always present, and it contains the string that was matched. In our example, the first time the replacement function was called, the first argument would contain a '5', the second time a '375', and so on for all 7 numbers.

Next there is one argument for every parenthesized capture group in the regular expression. Our example has one such capture group (the (\d+)), so there's only one of these arguments in our case. Because our example's capture group is exactly the same as the entire matched text in the regular expression, the second argument's value is the same as the first argument's value. Silly, I know — but I wanted to show you how these capture group arguments work. If you use a more complex regular expression, with multiple capture groups, these capture group arguments are extremely useful.

Finally, there is one last argument, after all the capture group arguments: this one is always the entire string being matched. In our example, we don't use this argument — but if we did, it would contain the seven numbers separated by newlines.

Then at last there's our replacement function: nfRep().

The first bit inside the while loop looks trickier than it really is. JavaScript's regular expressions don't support look-behinds, so we worked around this by repeatedly looking for stretches of four digits followed by either the end of the line or a comma (that's what the /(\d)(\d\d\d)(?=(,|$))/ is all about). If we find a match on that, then we know we need a comma between the first and second digits of our match, so we use a simple replacement to do that. We simply repeat that until there's no change in the string's length, which tells us that we didn't find any more places to stuff in a comma.

The last bit is trivial: we just prepend 11 spaces to our number string (in case we had a short number) and return the 12 righthand-most characters. That returned string replaces the originally matched string of digits found by the regular expression in formatter().

Voila! You are now either drooling at your desk, with glazed eyes rotating independently of each other — or you're a certified replacement function expert! Our young lady is definitely in the latter category...

So what happens when we run this puppy? I thought you'd never ask:

Unformatted:
5
375
4982
2347023
47632
6128
724592

Formatted:
5
375
4,982
2,347,023
47,632
6,128
724,592

Now isn't that just the cutest thing you ever did see?