Pattern: IPv4 Address

“IPv4 address” is one of the patterns that you can select on the Match panel. Use it to make a field match a internet address such as 192.168.0.1.

The documentation for Google Analytics explains how you can use a regular expression in Google Analytics to filter on IP addresses. We’ll follow the same example here, but explain how to generate the regular expression with RegexMagic.

  1. Click the New Formula button on the top toolbar to clear out all settings on the Samples, Match, and Action panels.
  2. On the Samples panel, paste in one new sample:
    10.1.1.1
    10.255.255.254
    172.0.0.1
    172.16.0.1
    172.31.255.254
    172.255.255.254
    192.168.1.1
    192.168.255.254
    63.212.171.1
    63.212.171.254
    8.8.8.8
    8.8.4.4
    
  3. Set the subject scope to “line by line”.
  4. On the Match panel, set “begin regex match at” to “start of text”, and set “end regex match at” to “end of text”.
  5. Click the Add First Field button to add field 1.
  6. In the “pattern to match field” drop-down list, select “IPv4 address”. By default, this pattern matches any IPv4 address in dotted notation.



  7. Turn on “limit the IPv4 addresses to these ranges”.
  8. Enter 63.212.171.1..63.212.171.254 as the range to limit the IP addresses to.
  9. Set the “field validation mode” to “strict”. This is necessary to make RegexMagic limit the range of IP addresses to exactly what you specified.
  10. On the Regex panel, select “POSIX ERE” as your application. This is the regular expression flavor used by Google Analytics. It is limited in features, but offers all you need to match IP address ranges.
  11. On the Regex panel, select “C# (.NET 2.0–7.0)” as your application, turn off free-spacing, and turn off mode modifiers. Click the Generate button, and you’ll get this regular expression:
    ^63\.212\.171\.(25[0-4]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]?)$

    Required options: ^$ don’t match at line breaks; dot matches line breaks.
    Unused options: Case sensitive.

  12. The Samples panel now highlights the IP addresses in the range that we specified:
    10.1.1.1
    10.255.255.254
    172.0.0.1
    172.16.0.1
    172.31.255.254
    172.255.255.254
    192.168.1.1
    192.168.255.254
    63.212.171.1
    63.212.171.254
    8.8.8.8
    8.8.4.4
    

Keeping The Regex Short Enough

The regex engine used by Google Analytics does not support regular expressions longer than 255 characters. The POSIX ERE specification allows this limitation but does not require it. RegexMagic always tries to generate regular expressions that are as short and simple as possible. But it will allow the regular expression to be as long as it needs to be to make it do what you specified. If the resulting regex is too long, RegexMagic does provide several options to make it shorter if you allow a change in specifications.

To keep your regex short, turn off the option “validate 0..255 in dotted addresses” and specify ranges that span the whole 0..255 range as much as possible. So specify 63.212.171.0..63.212.171.255 or 63.212.171.1/24 instead of 63.212.171.1..63.212.171.254. When you do this, RegexMagic generates [0-9]{1,3} rather than (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) to match a number between 0 and 255. Though [0-9]{1,3} could match numbers such as 000 or 999 that don’t belong in IPv4 addresses, this isn’t a problem when filtering web logs. The logs don’t contain invalid IP addresses. There’s no need to make the regex longer in order to exclude them. The regex from our example then becomes:

^63\.212\.171\.[0-9]{1,3}$

Required options: ^$ don’t match at line breaks; dot matches line breaks.
Unused options: Case sensitive.

The RegexMagic pattern for IPv4 addresses allows you to specify as many IP ranges as you want delimited with semicolon. RegexMagic will roll all the ranges into one big regex. You can match any private IPv4 address by setting the range to 10.0.0.0/8;172.16.0.0/12;192.168.0.0/16. RegexMagic combines these 3 ranges into one compact regex that still strictly matches all 3:

^(10\.[0-9]{1,3}|172\.(3[01]|2[0-9]|1[6-9])|192\.168)\.[0-9]{1,3}\.[0-9]{1,3}$

Required options: ^$ don’t match at line breaks; dot matches line breaks.
Unused options: Case sensitive.

While Google Analytics can only handle regular expressions up to 255 characters, it does allow you to use multiple filters, all with their own regular expression. So if you want to filter on multiple IP address ranges and the regex RegexMagic generates for all those ranges is too long, have RegexMagic generate one regex for each range, and use those regexes in separate filters in Google Analytics.

Setting the “field validation mode” to “strict” as this example does is appropriate when a regular expression is all you can use to restrict the range of IP addresses. But in other situations, such as when developing your own software, you may be better off with a simpler regex that just matches any combination of 4 numbers with dots between them, and filter the IP addresses in procedural code. If you’ll be processing the matched addresses in procedural code anyway, you’ll likely get better performance from a simple regex with a few extra checks in procedural code.

If you set the “field validation mode” to “average”, RegexMagic will generate a regular expression that is much shorter. The regex still requires a match in the form of 1.2.3.4. But if you set the pattern to restrict the address to certain ranges, the regex may match some addresses beyond the ranges you specified. For example, using “average” field validation with the range set to 10.0.0.0/8;172.16.0.0/12;192.168.0.0/16, the resulting regex will match any IPv4 address that begins with 10, 172, or 192. That includes some addresses that aren’t private IPv4 addresses. But you can see that the “average” regex is simpler than the previous “strict” regex:

^(10|172|192)\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$

Required options: ^$ don’t match at line breaks; dot matches line breaks.
Unused options: Case sensitive.

The more complex your combination of IP ranges, the bigger the difference between “average” and “strict” modes. If “average” isn’t simple enough, “loose” mode grabs any IPv4 address, as if both “limit the IPv4 addresses to these ranges” and “validate 0..255 in dotted addresses” were turned off:

^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$

Required options: ^$ don’t match at line breaks; dot matches line breaks.
Unused options: Case sensitive.

Reference