NAME
XML::TreePP -- Pure Perl implementation for parsing/writing xml files
SYNOPSIS
parse xml file into hash tree
use XML::TreePP;
my $tpp = XML::TreePP->new();
my $tree = $tpp->parsefile( "index.rdf" );
print "Title: ", $tree->{"rdf:RDF"}->{item}->[0]->{title}, "\n";
print "URL: ", $tree->{"rdf:RDF"}->{item}->[0]->{link}, "\n";
write xml as string from hash tree
use XML::TreePP;
my $tpp = XML::TreePP->new();
my $tree = { rss => { channel => { item => [ {
title => "The Perl Directory",
link => "http://www.perl.org/",
}, {
title => "The Comprehensive Perl Archive Network",
link => "http://cpan.perl.org/",
} ] } } };
my $xml = $tpp->write( $tree );
print $xml;
get remote xml file with HTTP-GET and parse it into hash tree
use XML::TreePP;
my $tpp = XML::TreePP->new();
my $tree = $tpp->parsehttp( GET => "http://use.perl.org/index.rss" );
print "Title: ", $tree->{"rdf:RDF"}->{channel}->{title}, "\n";
print "URL: ", $tree->{"rdf:RDF"}->{channel}->{link}, "\n";
get remote xml file with HTTP-POST and parse it into hash tree
use XML::TreePP;
my $tpp = XML::TreePP->new( force_array => [qw( item )] );
my $cgiurl = "http://search.hatena.ne.jp/keyword";
my $keyword = "ajax";
my $cgiquery = "mode=rss2&word=".$keyword;
my $tree = $tpp->parsehttp( POST => $cgiurl, $cgiquery );
print "Link: ", $tree->{rss}->{channel}->{item}->[0]->{link}, "\n";
print "Desc: ", $tree->{rss}->{channel}->{item}->[0]->{description}, "\n";
DESCRIPTION
XML::TreePP module parses XML file and expands it for a hash tree. And
also generate XML file from a hash tree. This is a pure Perl
implementation. You can also download XML from remote web server like
XMLHttpRequest object at JavaScript language.
EXAMPLES
Parse XML file
Sample XML source:
Yasuhisa
Chizuko
Shiori
Yusuke
Kairi
Sample program to read a xml file and dump it:
use XML::TreePP;
use Data::Dumper;
my $tpp = XML::TreePP->new();
my $tree = $tpp->parsefile( "family.xml" );
my $text = Dumper( $tree );
print $text;
Result dumped:
$VAR1 = {
'family' => {
'-name' => 'Kawasaki',
'father' => 'Yasuhisa',
'mother' => 'Chizuko',
'children' => {
'girl' => 'Shiori'
'boy' => [
'Yusuke',
'Kairi'
],
}
}
};
Details:
print $tree->{family}->{father}; # the father's given name.
The prefix '-' is added on every attribute's name.
print $tree->{family}->{"-name"}; # the family name of the family
The array is used because the family has two boys.
print $tree->{family}->{children}->{boy}->[1]; # The second boy's name
print $tree->{family}->{children}->{girl}; # The girl's name
Text node and attributes:
If a element has both of a text node and attributes or both of a text
node and other child nodes, value of a text node is moved to "#text"
like child nodes.
use XML::TreePP;
use Data::Dumper;
my $tpp = XML::TreePP->new();
my $source = 'Kawasaki Yusuke';
my $tree = $tpp->parse( $source );
my $text = Dumper( $tree );
print $text;
The result dumped is following:
$VAR1 = {
'span' => {
'-class' => 'author',
'#text' => 'Kawasaki Yusuke'
}
};
The special node name of "#text" is used because this elements has
attribute(s) in addition to the text node. See also "text_node_key"
option.
METHODS
new
This constructor method returns a new XML::TreePP object with %options.
$tpp = XML::TreePP->new( %options );
set
This method sets a option value for "option_name". If $option_value is
not defined, its option is deleted.
$tpp->set( option_name => $option_value );
See OPTIONS section below for details.
get
This method returns a current option value for "option_name".
$tpp->get( 'option_name' );
parse
This method reads XML source and returns a hash tree converted. The
first argument is a scalar or a reference to a scalar.
$tree = $tpp->parse( $source );
parsefile
This method reads a XML file and returns a hash tree converted. The
first argument is a filename.
$tree = $tpp->parsefile( $file );
parsehttp
This method receives a XML file from a remote server via HTTP and
returns a hash tree converted.
$tree = $tpp->parsehttp( $method, $url, $body, $head );
$method is a method of HTTP connection: GET/POST/PUT/DELETE $url is an
URI of a XML file. $body is a request body when you use POST method.
$head is a request headers as a hash ref. LWP::UserAgent module or
HTTP::Lite module is required to fetch a file.
( $tree, $xml, $code ) = $tpp->parsehttp( $method, $url, $body, $head );
In array context, This method returns also raw XML source received and
HTTP response's status code.
write
This method parses a hash tree and returns a XML source generated.
$source = $tpp->write( $tree, $encode );
$tree is a reference to a hash tree.
writefile
This method parses a hash tree and writes a XML source into a file.
$tpp->writefile( $file, $tree, $encode );
$file is a filename to create. $tree is a reference to a hash tree.
OPTIONS FOR PARSING XML
This module accepts option parameters following:
force_array
This option allows you to specify a list of element names which should
always be forced into an array representation.
$tpp->set( force_array => [ 'rdf:li', 'item', '-xmlns' ] );
The default value is null, it means that context of the elements will
determine to make array or to keep it scalar or hash. Note that the
special wildcard name '*' means all elements.
force_hash
This option allows you to specify a list of element names which should
always be forced into an hash representation.
$tpp->set( force_hash => [ 'item', 'image' ] );
The default value is null, it means that context of the elements will
determine to make hash or to keep it scalar as a text node. See also
"text_node_key" option below. Note that the special wildcard name '*'
means all elements.
cdata_scalar_ref
This option allows you to convert a cdata section into a reference for
scalar on parsing XML source.
$tpp->set( cdata_scalar_ref => 1 );
The default value is false, it means that each cdata section is
converted into a scalar.
user_agent
This option allows you to specify a HTTP_USER_AGENT string which is used
by parsehttp() method.
$tpp->set( user_agent => 'Mozilla/4.0 (compatible; ...)' );
The default string is 'XML-TreePP/#.##', where '#.##' is substituted
with the version number of this library.
http_lite
This option forces pasrsehttp() method to use a HTTP::Lite instance.
my $http = HTTP::Lite->new();
$tpp->set( http_lite => $http );
lwp_useragent
This option forces pasrsehttp() method to use a LWP::UserAgent instance.
my $ua = LWP::UserAgent->new();
$ua->timeout( 60 );
$ua->env_proxy;
$tpp->set( lwp_useragent => $ua );
You may use this with LWP::UserAgent::WithCache.
base_class
This blesses class name for each element's hashref. Each class is named
straight as a child class of it parent class.
$tpp->set( base_class => 'MyElement' );
my $xml = 'text';
my $tree = $tpp->parse( $xml );
print ref $tree->{root}->{parent}->{child}, "\n";
A hash for element above is blessed to
"MyElement::root::parent::child" class. You may use this with
Class::Accessor.
elem_class
This blesses class name for each element's hashref. Each class is named
horizontally under the direct child of "MyElement".
$tpp->set( base_class => 'MyElement' );
my $xml = 'text';
my $tree = $tpp->parse( $xml );
print ref $tree->{root}->{parent}->{child}, "\n";
A hash for element above is blessed to "MyElement::child" class.
OPTIONS FOR WRITING XML
first_out
This option allows you to specify a list of element/attribute names
which should always appears at first on output XML code.
$tpp->set( first_out => [ 'link', 'title', '-type' ] );
The default value is null, it means alphabetical order is used.
last_out
This option allows you to specify a list of element/attribute names
which should always appears at last on output XML code.
$tpp->set( last_out => [ 'items', 'item', 'entry' ] );
indent
This makes the output more human readable by indenting appropriately.
$tpp->set( indent => 2 );
This doesn't strictly follow the XML Document Spec but does looks nice.
xml_decl
This module generates an XML declaration on writing an XML code per
default. This option forces to change or leave it.
$tpp->set( xml_decl => '' );
output_encoding
This option allows you to specify a encoding of xml file generated by
write/writefile methods.
$tpp->set( output_encoding => 'UTF-8' );
On Perl 5.8.0 and later, you can select it from every encodings
supported by Encode.pm. On Perl 5.6.x and before with Jcode.pm, you can
use "Shift_JIS", "EUC-JP", "ISO-2022-JP" and "UTF-8". The default value
is "UTF-8" which is recommended encoding.
OPTIONS FOR BOTH
utf8_flag
This makes utf8 flag on for every element's value parsed and makes it on
for an XML code generated as well.
$tpp->set( utf8_flag => 1 );
Perl 5.8.1 or later is required to use this.
attr_prefix
This option allows you to specify a prefix character(s) which is
inserted before each attribute names.
$tpp->set( attr_prefix => '@' );
The default character is '-'. Or set '@' to access attribute values like
E4X, ECMAScript for XML. Zero-length prefix '' is available as well, it
means no prefix is added.
text_node_key
This option allows you to specify a hash key for text nodes.
$tpp->set( text_node_key => '#text' );
The default key is "#text".
ignore_error
This module calls Carp::croak function on an error per default. This
option makes all errors ignored and just return.
$tpp->set( ignore_error => 1 );
use_ixhash
This option keeps the order for each element appeared in XML.
Tie::IxHash module is required.
$tpp->set( use_ixhash => 1 );
This makes parsing performance slow. (about 100% slower than default)
AUTHOR
Yusuke Kawasaki, http://www.kawa.net/
COPYRIGHT AND LICENSE
Copyright (c) 2006-2008 Yusuke Kawasaki. All rights reserved. This
program is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.