Matching Unrelated Items Using Alternation

This example demonstrates how you can generate a regular expression that matches two or more unrelated bits of text in a file, regardless of where or how many times each bit of text occurs in the file. You can find this example as "Fields: alternation single" in the RegexMagic library.

For this example, we'll create a regex to finds all numbers and all email addresses in a file. To do this, we create a regex that matches a number or an email address. When you repeat this regex using the "find all" command in an application or programming languages, you will get a list of all the numbers and email addresses.

  1. Click the New Formula button on the top toolbar to clear out all settings on the Samples, Match, and Action panels.
  2. Set both "begin regex match at" and "end regex match at" to "anywhere".
  3. Set the "field validation mode" to "average".
  4. On the Samples panel, paste in one new sample:
    My favorite number is 42.
    You can email me at joe@fortytwo.com.
    Other nice numbers are 17, 382, and 794.
    joefortytwo@gmail.com is my alternative email address.
    
  5. Select the number "42" in the sample.
  6. Click the Mark button above the sample to mark "42" as field 1. RegexMagic automatically detects the correct "integer" pattern for this field.
  7. Select the email address "joe@fortytwo.com" in the sample.
  8. Click the Mark button to mark the email address as a field. RegexMagic knows that because the email address is not adjacent to the number, the only way a regex could match both is by using alternation. Thus, RegexMagic adds a new field 1 with "kind of field" set to "alternation". The field with the integer pattern becomes field 2, which is the first alternative under field 1. The new field 3 is added as the second alternative under field 1. RegexMagic automatically detects the correct "email address" pattern for this field.



  9. On the Regex panel, select "C# (.NET 2.0–7.0)" as your application, turn on free-spacing, and turn off mode modifiers. Click the Generate button, and you'll get this regular expression:
    # 1. One of the fields 2 to 3
      # 2. Integer
      [0-9]+
      |
      # 3. Email address
      [!#$%&'*+./0-9=?_`a-z{|}~^-]+@[.0-9a-z-]+\.[a-z]{2,63}
    

    Required options: Case insensitive; Free-spacing.
    Unused options: Dot doesn’t match line breaks; ^$ don’t match at line breaks; Numbered capture.

  10. Because the Samples panel always highlights all regex matches like a "find all" command would, it now shows that our regex matches all the numbers and email addresses:
    My favorite number is 42.
    You can email me at joe@fortytwo.com.
    Other nice numbers are 17, 382, and 794.
    joefortytwo@gmail.com is my alternative email address.
    
  11. If you want to get the same results when programming, select one of the functions "get a list of all regex matches in a string" or "use regex object to get a list of all regex matches in a string" on the Use panel. Most of the source code templates that ship with RegexMagic have one of these functions. Some may use the term "array" instead of "list".

The generated regular expression uses alternation to combine the patterns for all the fields into one regular expression. The result is that this regular expression will match either a number, or an email address, regardless of context. To find all numbers and email addresses in a file, apply this regular expression repeatedly to the same text using a "find all" command in your application or programming language.

Related Examples

Reference