Table of Contents

  • Introduction
  • Administrator
  • User
  • Appendix
    · Introduction
    · Appendix A: Available Functions
    · Appendix B: The LogView Tag
    · Appendix C: Pike Regexp Syntax
  • Appendix C: Pike Regexp Syntax
    In LogView, Pike regular expressions are used to separate served files into different categories, like for example pages and non-pages, and also for specifying names of hosts to ignore in the statistics. Pike regular expressions are very powerful, as we will show in the following examples.

    Really, a Pike regexp is the same kind of regexp that is used by many UNIX tools, like egrep and awk, and the user who is experienced in this area may well stop reading right here, or jump directly to the reference chart to refresh his memory concerning the syntax.

    Short introduction
    When constructing regexp to match the right strings, the characters are divided into normal characters (a - z, A - Z, 0 - 9) and special characters ( for example ".", "*", "(", ")", "|" and "&"). An normal character matches itself, and so a word built up by ordinary caracters matches itself. Also, a pattern matches a string if it matches any part of the string.

    PatternMatchesDoes not match
    rhino  rhino  www.hippo.potamus.com  
      www.rhino.ceros.com    

    Sometimes we want to specify that the beginning or end of the string must match the pattern. This is done with the special characters ^ and $, respectively.

    PatternMatchesDoes not match
    ^rhino  rhino  www.rhino.ceros.com  
      rhino.ceros.com    
          
    com$  rhino.ceros.com  www.rhino.ceros.com.tw  
      hippo.potamus.com    

    Now for the special characters "." and "*". A "." matches one occurance of any character and a character followed by a "*" maches any number, even zero, of consequent occcurances of the character. This letter might also be a ".", in which case zero or more of any letter is matched. Thus, ".*" gives the same effect as a single "*" character given in a filename at a UNIX or DOS prompt, which can be a bit confusing.

    PatternMatchesDoes not match
    ....rhino  www.rhino.ceros.com  rhino.ceros.com  
      yyy.rhino.ceros.com    
      yyyyrhino.ceros.com    
          
    w*.rhino.ceros  .rhino.ceros.com  yyy.rhino.ceros.com  
      www.rhino.ceros.com    

    As can be noticed above, the first pattern also matched yyyyrhino.ceros.com, since "." is a special character. However, if we want to specify it to be matched as a normal character, we have to put the escape character, "\" in front of it.

    PatternMatchesDoes not match
    ...\.rhino  www.rhino.ceros.com  wwwwceros.com  

    Now we've covered the basic stuff about simple regexps. However, there is also a possibility to use boolean functions to put together several simple regexps to complex ones - for example, (regexp1) | (regexp2) is a new regexp that matches a string if regexp1 or regexp2 matches the string, and similiarly (regexp1) & (regexp2) matches a string if both regexp1 and regexp2 matches the string.

    Quick reference chart

    This is the complete reference chart for the Pike regexp syntax, taken from the Pike manual.
    PatternMatches
    .  any one character  
    [abc]  a, b or c  
    [a-z]  any character a to z inclusive  
    [^ac]  any character except a and c   
    (x)  x (x might be any regexp) If used with split, this also puts the string matching x into the result array.   
    x*  zero or more occurrences of 'x' (x may be any regexp)  
    x+  one or more occurrences of 'x' (x may be any regexp)  
    x|y  x or y. (x or y may be any regexp)   
    xy  xy (x and y may be any regexp)   
    ^  beginning of string (but no characters)   
    $  end of string (but no characters)   
    \<  the beginning of a word (but no characters)   
    \>  the end of a word (but no characters)   

    Let's look at a few examples:

    RegexpMatches
    [0-9]+  one or more digits  
    [^ \t\n]  exactly one non-whitespace character  
    (foo)|(bar)  either 'foo' or 'bar'  
    \.html$  any string ending in '.html'  
    ^\.  any string starting with a period  

    Note that \ can be used to quote these characters in which case they match themselves, nothing else. Also note that when quoting these something in Pike you need to write \\ because Pike also uses this character for quoting.