Regular Expressions Part 4

(Pssst! All the code referenced in this post can be found in https://github.com/NerdcoreSteve/regular-expressions. Pass it on!)

This is part 3 in a series on regular expressions. Have a look at part 1, part 2, and part 3 if you haven't already.

Capturing Groups

What if I want to capture a specific part of a string based on the characters surrounding it? This is something I often use a regex for:

console.log(  
    'I get a banana. You get a kiwi. Your mom is a potato'
        .match(/get a (\w+)/))
[ 'get a banana',
  'banana',
  index: 2,
  input: 'I get a banana. You get a kiwi. Your mom is a potato' ]

That's great! The first element of the array/object thing is the full regex match, the second is the bit in the parentheses. This is called group capture.

But what if I wanted to do a global group capture?

console.log(  
    'I get a banana. You get a kiwi. Your mom is a potato'
        .match(/get a (\w+)/g))
[ 'get a banana', 'get a kiwi' ]

Hey, what gives? Where're my group captures?

Well, it turns out that you need to use a regex's .exec method if you want them all. Here's a function that'll take care of things for you:

const regexGlobalGroupCapture = (regex, input) => {  
    var matches, output = []
    while (matches = regex.exec(input)) {
        output.push(matches[1])
    }
    return output
}

Lots of mutation under the hood but it acts like a pure function from the outside. Here it is in action:

console.log(  
    regexGlobalGroupCapture(
        /get a (\w+)/g,
        "I get a banana. You get a kiwi. Your mom is a potato"))
[ 'banana', 'kiwi' ]

To explain howregexGlobalGroupCapture works:
It uses the .exec method of the regular expression object. Every time .exec is called we grab another match from the regex. We add all these to an array and return it.

Why do we have to use the .exec method? Because JavaScript is wacky y'all. Sometimes you just have to do these things and the only way you find out is by googling. :)

Replacing

A common use for regex's is modifying strings. We can use the .replace method for that:

console.log(  
    "name: Clark Kent, occupation: Reporter"
        .replace(/name: (\w+ \w+), occupation: (\w+)/, '$1 is a $2'))
Clark Kent is a Reporter  

Here I'm grabbing a name and occupation from a string with a particular format. Then I'm replacing that string with a wholly different string, but it includes the sub-strings I've group captured. The $1 represents the first group, and $2 represents the second group.

Group capture can be pretty powerful. As you may know, I've been writing posts about the free code camp algorithmic challenges, and I've been writing my solutions in a git repo for all to see.

Well, in that git repo, I've written tests to make sure my solutions work. I write these tests using text from the free code camp site itself.

For example, part of the description for the Spinal Tap Case challenge reads:
spinalCase("This Is Spinal Tap") should return "this-is-spinal-tap".

I can use regular expressions to turn this into a jasmine test in my repo:

console.log(  
    'spinalCase("This Is Spinal Tap") should return "this-is-spinal-tap".'
        .replace(
            /(spinalCase\("([\w\s]+)"\) should return "([\w\s\-]+)".)/,
            '    it(\'$1\',\n        () => expect(\'$2\').toEqual(\'$3\'))'))
    it('spinalCase("This Is Spinal Tap") should return "this-is-spinal-tap".',
        () => expect('This Is Spinal Tap').toEqual('this-is-spinal-tap'))

The first thing to note about this regex is that it captures 3 groups.

First, the whole thing is surrounded by parentheses, so I can put it in the test description. That's the first group.

Second is the argument to spinalCase, which is ([\w\s]+). It's a character class including word characters and white space.

Note that I had to escape the parentheses I wanted to match: \("([\w\s]+)"\).

Third and finally, I captured the expected return value ([\w\s\-]+). It's a character class that includes word characters, white space, and the dash character (another character I needed to escape since - is used to describe a range in regex-land).

This is just a sample of how you can do really complicated text processing with just a little code, once you get familiar with regular expressions.

Splitting a String

We can also split a string using regular expressions, which can be pretty useful:

console.log(  
    '1. potatoes are my friends 2. You are a potato 3. You are my friend 4. Now and always you are my potato friend'
        .split(/\d+\./))
[ '',
  ' potatoes are my friends ',
  ' You are a potato ',
  ' You are my friend ',
  ' Now and always you are my potato friend' ]

Here we split on every number with a dot after it. Which gets us an array of sentences. Using this technique, you can easily turn a bit of text that's a numbered list into an html ordered list.

Check it out:

['1. potatoes are my friends 2. You are a potato 3. You are my friend 4. Now and always you are my potato friend']
    .map(s => s.split(/\d+\./))
    .map(xs => xs.slice(1))
    .map(xs => xs.map(s => `    <li>${s}</li>`))
    .map(xs => ['<ol>'].concat(xs).concat(['</ol>']))
    .map(xs => xs.join('\n'))
    [0]

I could've used ramda's pipe function, but I decided to use a JavaScript array like a Box functor, (just like in Professor Frisby's excellent tutorial. I highly recommend watching the whole thing if you haven't already.)

Move on to part 5!

Looking for a software developer?