Matching Complex Unrelated Items Using Alternation of Sequences

This example demonstrates how you can generate a regular expression that matches two or more unrelated bits of text in a file, regardless of where or how many times each bit of text occurs in the file. The bits of text are complex, meaning multiple RegexMagic patterns must be placed in sequence to match them. You can find this example as “Fields: alternation sequence” in the RegexMagic library.

For this example, we’ll create a regex to finds all numbers and all email addresses in a file, with the numbers between square brackets and the email addresses between angle brackets. The regex matches should include the delimiters. To do this, we create a regex that matches a number between square brackets or an email address between angle brackets. When you repeat this regex using the “find all” command in an application or programming languages, you will get a list of all the numbers and email addresses with their delimiters.

  1. Click the New Formula button on the top toolbar to clear out all settings on the Samples, Match, and Action panels.
  2. Set both “begin regex match at” and “end regex match at” to “anywhere”.
  3. Set the “field validation mode” to “average”.
  4. On the Samples panel, paste in one new sample:
    My favorite number is [42].
    You can email me at <joe@fortytwo.com>.
    Other nice numbers are [17], [382], and [794].
    <joefortytwo@gmail.com> is my alternative email address.
    
  5. Select the first [ in the sample, immediately before the number 42 on the first line.
  6. Click the Mark button above the sample to mark the opening square bracket as field 1. RegexMagic automatically detects the correct “
  7. literal text” pattern for this field.
  8. Select the number “42” in the sample.
  9. Click the Mark button to mark “42” as field 2. RegexMagic automatically detects the correct “integer” pattern for this field.
  10. Select the ] right after the number 42 that we just marked.
  11. Click the Mark button to mark the closing square bracket as field 3. Again we get the “literal text” pattern we want.
  12. Select the first < in the sample, immediately before “joe@fortytwo.com”.
  13. Click the Mark button to mark the opening angle bracket as a field. RegexMagic knows that because the angle bracket is not adjacent to the previous three fields we just marked, the only way a regex could match both is by using alternation. Thus, RegexMagic adds a new field 1 with “kind of field” set to “alternation”. The first alternative is a new field 2, with “kind of field” set to “sequence”. The 3 fields we marked previously are placed under the sequence field, with their field numbers changed to 3, 4, and 5. The new field to match the angle bracket is field 6, which is added as the second alternative under field 1.
  14. Select the email address “joe@fortytwo.com” in the sample.
  15. Click the Mark button to add a field for the email address. Since we’re marking this immediately after field 6, RegexMagic assumes that we always want the email address to follow immediately after the angle bracket. To accomplish this, RegexMagic adds a new field 6, with “kind of field” set to “sequence”. The field for the angle bracket becomes field 7 as the first field in the sequence. Field 8 is the new field for the email address.
  16. Select the > after the email address we just marked.
  17. Click the Mark button one more time to add field 9 for closing angle bracket. Again, RegexMagic assumes that we always want the angle bracket to follow the email address, so field 9 becomes the third field under sequence field 6.



  18. On the Regex panel, select “C# (.NET 2.0–7.0)” as your application, turn on free-spacing, and turn off mode modifiers. Click the Generate button, and you’ll get this regular expression:
    # 1. One of the fields 2 to 6
      # 2. Fields 3 to 5 in sequence
      
        # 3. Literal text
        \[
        # 4. Integer
        [0-9]+
        # 5. Literal text
        \]
      
      |
      # 6. Fields 7 to 9 in sequence
      
        # 7. Literal text
        <
        # 8. Email address
        [!#$%&'*+./0-9=?_`a-z{|}~^-]+@[.0-9a-z-]+\.[a-z]{2,63}
        # 9. Literal text
        >
      
    

    Required options: Case insensitive; Free-spacing.
    Unused options: Dot doesn’t match line breaks; ^$ don’t match at line breaks; Numbered capture.

  19. Because the Samples panel always highlights all regex matches like a “find all” command would, it now shows that our regex matches all the numbers and email addresses with their delimiters:
    My favorite number is [42].
    You can email me at <joe@fortytwo.com>.
    Other nice numbers are [17], [382], and [794].
    <joefortytwo@gmail.com> is my alternative email address.
    
  20. If you want to get the same results when programming, select one of the functions “get a list of all regex matches in a string” or “use regex object to get a list of all regex matches in a string” on the Use panel. Most of the source code templates that ship with RegexMagic have one of these functions. Some may use the term “array” instead of “list”.

Related Examples

Reference