Pattern: Unicode characters

“Unicode characters” is one of the patterns that you can select on the Match panel. Use this pattern to restrict a field to a certain set of Unicode characters. The repetition settings for the field determine how many characters the field can or must match.

This example shows how you can use the “Unicode characters” pattern to match a currency symbol. You can find this example as “Pattern: Unicode characters” in the RegexMagic library.

  1. Click the New Formula button on the top toolbar to clear out all settings on the Samples, Match, and Action panels.
  2. On the Samples panel, paste in one new sample:
    Some of the world's major currency symbols are $, €, ¥, and £.
  3. On the Match panel, set both “begin regex match at” and “end regex match at” to “anywhere”.
  4. Click the Add First Field button to add field 1.
  5. In the “pattern to match field” drop-down list, select “Unicode characters”.



  6. In the “Unicode categories” list, tick the “currency symbols”.
  7. On the Regex panel, select “C# (.NET 2.0–7.0)” as your application, turn off free-spacing, and turn off mode modifiers. Click the Generate button, and you’ll get this regular expression which matches a single character that is considered to be a currency symbol by the Unicode standard:
    \p{Sc}

    Unused options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Numbered capture.

  8. The Samples panel now shows the regular expression finds 4 different matches in our sample text:
    Some of the world's major currency symbols are $, , ¥, and £.

Related Examples

Reference