Capturing Groups

This example demonstrates how you can add capturing groups to a regular expression to figure out which part of the regular expression found the match. You can find this example as “Capturing groups” in the RegexMagic library.

For this example, we’ll continue with the regular expression created in the example about matching unrelated items using alternation. That regular expression matches a number or an email address. Using this regex with a “find all” command, we can get a list of all numbers and email addresses.

Now we want to use this regex to iterate over all the numbers and email addresses in a file, and we want to separate the numbers and the email addresses, without having to use a second regular expression to check whether the match found by our regular expression is a number or an email address. We can achieve this by placing capturing groups around the parts of the regex that match the number and the email address. Since our regex matches only one number or one email address at a time, only one of the capturing groups will actually capture any text with each regex match. If the group for the number captured text, we know we have a number. If not, the group for the email address will have captured the email address.

You could actually achieve this with just one capturing group for the number. When the group for the number doesn’t capture anything, retrieving the overall regex match gives the email address. But in this example we’ll create two groups just for practice.

  1. On the Library panel, load RegexMagic.rml if it isn’t loaded already.
  2. In the list of RegexMagic formulas in the library, select “Fields: alternation single”.
  3. Click the Use button to populate the Samples, Match, and Action panels with the settings from the RegexMagic formula we just loaded. These are the settings we made in the example about matching unrelated items using alternation.
  4. On the Action panel, click the New button to add a capturing group.
  5. Enter “number” as the group’s name. You should give your capturing groups a name even if your regex flavor does not support named capture.
  6. Select “2 integer” in the “field” drop-down list. We now have a capturing group that will give us the text matched by field 2, if any.



  7. Click the New button on the Action panel again to add the second capturing group.
  8. Enter “email” as the group’s name.
  9. Select “3 email address” in the “field” drop-down list.
  10. On the Regex panel, select “C# (.NET 2.0–7.0)” as your application, turn on free-spacing, and turn off mode modifiers. Click the Generate button, and you’ll get the regular expression below. Notice the difference between this regex and the one we got in the example about matching unrelated items using alternation. Two named capturing groups have been added, and comments for each field indicate the capturing group names.
    # 1. One of the fields 2 to 3
      # 2. number: Integer
      (?<number>[0-9]+)
      |
      # 3. email: Email address
      (?<email>[!#$%&'*+./0-9=?_`a-z{|}~^-]++@[.0-9a-z-]+\.[a-z]{2,63}+)
    

    Required options: Case insensitive; Free-spacing.
    Unused options: Dot doesn’t match line breaks; ^$ don’t match at line breaks; Greedy quantifiers.

  11. On the Action panel, you can now check the “actual backreferences”. This box indicates the name or number you’ll need to reference the group in your source code.



  12. On the Regex panel, select the JavaScript regular expression flavor. With the Generate button still down, you’ll get a regex that still has the group names in the comments. The actual capturing groups are unnamed, because JavaScript does not support named capture.
    (?<number>[0-9]+)|(?<email>[\d!#$%&'*+./=?_`a-z{|}~^-]+@[\d.a-z-]+\.[a-z]{2,63})

    Required options: Case insensitive.
    Unused options: Dot doesn’t match line breaks; ^$ don’t match at line breaks.

  13. On the Action panel, the actual backreference is now 1 for the group capturing the number, and 2 for the group capturing the email address.



Related Examples

Reference