', on the
other hand,
It is strongly suggested that for ease in parsing, any tags
that you do not explicitly close have a `/' at the end of
the tag:
Here's my cat:
Methods for `YAPE::HTML'
* `use YAPE::HTML;'
* `use YAPE::HTML qw( MyExt::Mod );'
If supplied no arguments, the module is loaded normally, and
the node classes are given the proper inheritence (from
`YAPE::HTML::Element'). If you supply a module (or list of
modules), `import' will automatically include them (if
needed) and set up *their* node classes with the proper
inheritence -- that is, it will append `MyExt::Mod::Element'
and `YAPE::HTML::Element' to each node class's `@ISA' (where
`MyExt::Mod' is the name of the module being used).
* `my $p = YAPE::HTML->new($HTML, $strict);'
Creates a `YAPE::HTML' object, using the contents of the
`$HTML' string as its HTML to parse. The optional second
argument determines whether this parser instance will demand
strict comment parsing and require all tags to be closed
with a closing tag or a `/' at the end of the tag (`
'). Any true value (except for the special string `-
NO_STRICT') will turn strict parsing on. This is off by
default. (This could be considered a bug.)
* `my $text = $p->chunk($len);'
Returns the next `$len' characters in the input string;
`$len' defaults to 30 characters. This is useful for
figuring out why a parsing error occurs.
* `my $done = $p->done;'
Returns true if the parser is done with the input string,
and false otherwise.
* `my $errstr = $p->error;'
Returns the parser error message.
* `my $coderef = $p->extract(...);'
Returns a code reference that returns the next object that
matches the criteria given in the arguments. The arguments
are various; all text:
$p->extract(-TEXT);
all comments:
$p->extract(-COMMENT);
all tags:
$p->extract(-TAG);
specific tags:
$p->extract(b => [], i => [], u => []);
specific tags with specific attributes:
$p->extract(a => ['href','target']);
regex object to match tags:
$p->extract(qr/^h[1-6]$/ => []);
regex object with specific attributes:
$p->extract(qr/^h[1-6]$/ => ['align']);
or any combination of these -- the exception being that the
three constants must appear before any of the tag-attribute
pairs:
$p->extract(
-COMMENT, # all comments
div => ['align'], # with ALIGN attr
qr/^t[drh]$/ => [], #
| tags
);
* `my $node = $p->display(...);'
Returns a string representation of the entire content. It
calls the `parse' method in case there is more data that has
not yet been parsed. This calls the `fullstring' method on
the root nodes. Check the `YAPE::HTML::Element' docs on the
arguments to `fullstring'.
* `my $node = $p->next;'
Returns the next token, or `undef' if there is no valid
token. There will be an error message (accessible with the
`error' method) if there was a problem in the parsing.
* `my $node = $p->parse;'
Calls `next' until all the data has been parsed.
* `my $attr = $p->quote($string);'
Returns a quoted string, suitable for using as an attribute.
It turns any embedded `"' characters into `"'. This can
also be called as a raw function:
my $quoted = YAPE::HTML::quote($string);
* `my $root = $p->root;'
Returns an array reference holding the root of the tree
structure -- for documents that contain multiple top-level
tags, this will have more than one element.
* `my $state = $p->state;'
Returns the current state of the parser. It is one of the
following values: `close(TAG)', `comment', `done', `error',
`open(TAG)', `text', `text(script)', or `text(xmp)'. The
`open' and `close' states contain the name of the element in
parentheses (ex. `open(img)'). Tag names, as well as the
names of attributes, are converted to lowercase. The state
of `text(script)' refers to text found inside an ` |