Table of Contents

  • Introduction
  • Administrator
  • User
  • Appendix
    · Introduction
    · Appendix A: Available Functions
    · Appendix B: The LogView Tag
    · Appendix C: Pike Regexp Syntax
  • Appendix C: Pike Regexp Syntax
    In LogView, Pike regular expressions are used to separate served files into different categories, like for example pages and non-pages, and also for specifying names of hosts to ignore in the statistics. Pike regular expressions are very powerful, as we will show in the following examples.

    Really, a Pike regexp is the same kind of regexp that is used by many UNIX tools, like egrep and awk, and the user who is experienced in this area may well stop reading right here, or jump directly to the reference chart to refresh his memory concerning the syntax.

    Short introduction
    When constructing regexp to match the right strings, the characters are divided into normal characters (a - z, A - Z, 0 - 9) and special characters ( for example ".", "*", "(", ")", "|" and "&"). An normal character matches itself, and so a word built up by ordinary caracters matches itself. Also, a pattern matches a string if it matches any part of the string.

    Pattern Matches Does not match rhino rhino www.hippo.potamus.com www.rhino.ceros.com Sometimes we want to specify that the beginning or end of the string must match the pattern. This is done with the special characters ^ and $, respectively.

    Pattern Matches Does not match ^rhino rhino www.rhino.ceros.com rhino.ceros.com com$ rhino.ceros.com www.rhino.ceros.com.tw hippo.potamus.com

    Now for the special characters "." and "*". A "." matches one occurance of any character and a character followed by a "*" maches any number, even zero, of consequent occcurances of the character. This letter might also be a ".", in which case zero or more of any letter is matched. Thus, ".*" gives the same effect as a single "*" character given in a filename at a UNIX or DOS prompt, which can be a bit confusing.

    Pattern Matches Does not match ....rhino www.rhino.ceros.com rhino.ceros.com yyy.rhino.ceros.com yyyyrhino.ceros.com w*.rhino.ceros .rhino.ceros.com yyy.rhino.ceros.com www.rhino.ceros.com

    As can be noticed above, the first pattern also matched yyyyrhino.ceros.com, since "." is a special character. However, if we want to specify it to be matched as a normal character, we have to put the escape character, "\" in front of it.

    Pattern Matches Does not match ...\.rhino www.rhino.ceros.com wwwwceros.com

    Now we've covered the basic stuff about simple regexps. However, there is also a possibility to use boolean functions to put together several simple regexps to complex ones - for example, (regexp1) | (regexp2) is a new regexp that matches a string if regexp1 or regexp2 matches the string, and similiarly (regexp1) & (regexp2) matches a string if both regexp1 and regexp2 matches the string.

    Quick reference chart

    This is the complete reference chart for the Pike regexp syntax, taken from the Pike manual. Pattern Matches . any one character [abc] a, b or c [a-z] any character a to z inclusive [^ac] any character except a and c (x) x (x might be any regexp) If used with split, this also puts the string matching x into the result array. x* zero or more occurrences of 'x' (x may be any regexp) x+ one or more occurrences of 'x' (x may be any regexp) x|y x or y. (x or y may be any regexp) xy xy (x and y may be any regexp) ^ beginning of string (but no characters) $ end of string (but no characters) \< the beginning of a word (but no characters) \> the end of a word (but no characters)

    Let's look at a few examples:

    Regexp Matches [0-9]+ one or more digits [^ \t\n] exactly one non-whitespace character (foo)|(bar) either 'foo' or 'bar' \.html$ any string ending in '.html' ^\. any string starting with a period

    Note that \ can be used to quote these characters in which case they match themselves, nothing else. Also note that when quoting these something in Pike you need to write \\ because Pike also uses this character for quoting.