=head1 NAME

XML::TiePYX - Read XML data in PYX format via tied filehandle

=head1 SYNOPSIS

  use XML::TiePYX;

  tie *XML,'XML::TiePYX','file.xml'

  open IN,'file.xml' or die $!;
  tie *XML,'XML::TiePYX',\*IN,Condense=>0;

  my $text='<tag xmlns="http://www.omsdev.com">text</tag>';
  tie *XML,'XML::TiePYX',\$text,Namespaces=>1;

=head1 DESCRIPTION

XML::TiePYX lets you read an XML file or string in PYX format from a tied 
filehandle.  PYX is a line-oriented, parsed representation of XML developed 
by Sean McGrath (http://www.pyxie.org).  Each line corresponds to one 
"event" in the XML, with the first character indicating the type of event:

=over 4

=item (

The start of an element; the rest of the line is its name.

=item A

An attribute; the rest of the line is the attribute's name, a space, and its value.

=item )

The end of an element; the rest of the line is its name.

=item -

Literal text (characters).  The rest of the line is the text.

=item ?

A processing instruction.  The rest of the line is the instruction's target, a space, and the instruction's value.

=back

Newlines in attribute values, text, and processing instruction values are 
represented as the literal sequence '\n' (that is, a backslash followed by 
an 'n').  By default, consecutive runs of characters are always gathered 
into a single text event, but this behavior can be disabled.  Comments are 
*not* available through PYX.

Just as SAX is an API well suited to "push"-mode XML parsing, PYX is well- 
suited to "pull"-mode parsing where you want to capture the state of the 
parse through your program's flow of code rather than through a bunch of 
state variables.  This module uses incremental parsing to avoid the need to 
buffer up large numbers of events.

This module implements an (unofficial) extension to the PYX format to allow 
namespace processing.  If namespaces are enabled, an element or attribute 
name will be prefixed by its namespace URI (*NOT* any namespace prefix used 
in the document) enclosed in curly braces.  A name with no namespace will 
be prefixed with {}.


=head1 INTERFACE

  tie *tied_handle, 'XML::TiePYX', source, [Option=>value,...]

I<tied_handle> is the filehandle from which the PYX events will be read.  

I<source> is either a reference to a string containing the XML, the name of 
a file containing the XML, or an open IO::Handle or filehandle glob 
reference from which the XML can be read.

The I<Option>s can be any options allowed by XML::Parser and 
XML::Parser::Expat, as well as two module-specific options.  I<Validating> 
will provide a validating parse by using XML::Checker::Parser in place of 
XML::Parser if set to a true value.  I<Condense>, if set to a true value 
(the default), causes all consecutive runs of character data to be gathered 
up into a single PYX event.  If set false, multiple consecutive character 
data events may occur in the stream (which may be desirable when dealing 
with large chunks of text).

The tied filehandle may be read from using either the diamond operator 
(<HANDLE>), getc(), or read().  The diamond operator always returns a line 
at a time regardless of the setting of $/.

=head1 EXAMPLE

This program (B<psectp.plx> in the distribution) prints a numbered outline from an 
XML file in which an <outline> can contain zero or more <sect>s, each with 
a I<title> attribute, and each <sect> can contain zero or more nested 
<sect>s or <para>s containing text, as in the B<sects.otl> file included with 
the distribution.  The -c option makes it print just a table of contents.

This is actually a traditional recursive-descent parser using PYX events as tokens.

  #!/usr/bin/perl -w

  use strict;
  use XML::TiePYX;
  use Text::Wrap;
  use Getopt::Std;

  my (@sectnums,%opts);

  getopts('c',\%opts);

  die "usage: psect [-c] file\n" unless @ARGV==1;

  tie *XML,'XML::TiePYX',$ARGV[0];
  die "illegal structure" unless get_event() =~ /^\(outline/;
  push @sectnums,0;
  print_sect() while get_event() =~ /^\(sect/;
  die unless /^\)outline/;
  close XML;

  sub print_sect {
    <XML>=~/^Atitle (.*)/ or die "missing title";
    ++$sectnums[-1];
    print ' ' x (4*$#sectnums),join('.',@sectnums)," $1\n";
    print "\n" unless $opts{c};
    push @sectnums,0;
    while (get_event() !~ /^\)sect/) {
      /^\(sect/ and print_sect(),next;
      /^\(para/ and print_para(),next;
      die "illegal structure";
    }
    pop @sectnums;    
  }
  
  sub print_para() {
    die "illegal structure" unless <XML> =~ /^-(.*)/;
    $_=$1;
    s/\\n/ /g;
    s/^\s+//;
    s/\s+$//;
    print wrap((' ' x (4*($#sectnums-1))) x 2,$_),"\n\n" unless $opts{c};
    die "illegal structure" unless <XML> =~ /^\)para/;
  }

  sub get_event {
    $_=<XML>;
    $_=<XML> if /^-(\s|\\n)*$/;
    $_;
  }

=head1 RATIONALE

There's already an XML::PYX module (written by Matt Sergeant) available, so 
why another PYX implementation?  Mainly because XML::PYX is intended to be 
used in a standalone PYX-outputting program which you open as a pipe.  That 
works very well under Unix, aside from the overhead of forking a separate 
process, but is problematic on Win32 systems for a variety of niggling 
reasons: the standalone script is supplied as a batch file, whose output 
can't be properly redirected into a pipe unless you invoke it as 'perl 
/perl/bin/pyx|' instead of just 'pyx|'.  Both Win95 and Win98, as well as 
possibly other Win32 systems, implement pipes using temporary files and the 
reading process can't start reading until the writing process is done 
writing, which means that if you're parsing a huge file you may have to 
wait a long time before getting *any* output.  The ability to guarantee a 
single character data event for any run of characters can often simplify 
processing.  And finally, when I wrote this the only supported namespace- 
aware way to parse XML was the raw handlers interface of XML::Parser, which 
is needlessly complicated for simple applications (there are, of course, 
those who would argue that "simple applications" and "namespace-aware" are 
mutually-exclusive categories).

=head1 BUGS

The I<Validating> option does not work correctly, as XML::Checker::Parser 
does not implement the parse_start() method.

Error handling leaves much to be desired.

=head1 TODO

Write mode (PYX to XML).

Public ID translation using XML::Catalog;

=head1 AUTHOR

Eric Bohlman (ebohlman@netcom.com, ebohlman@omsdev.com)

=head1 COPYRIGHT

Copyright 2000 Eric Bohlman.  All rights reserved.

This program is free software; you can use/modify/redistribute it under the
same terms as Perl itself.

=head1 SEE ALSO

  XML::PYX
  XML::Parser
  XML::Parser::Expat
  XML::Checker
  perl(1).

=cut