ParaTools 1.00 Documentation - Required Software


What software does Biblio::DocParser need?

Perl Modules

The ParaTools::Utils module provides functions to retrieve and convert files both on the Internet and on a local file-system. The former requires a few extra Perl modules to function:

LWP::Simple and LWP::UserAgent
These are Perl modules that provide an interface to the World Wide Web, and are used by the ParaTools DocParser to retrieve documents from the Internet.

File::Temp
This module handles temporary files across multiple platforms.

There are also some dependencies for the above modules, including MIME::Base64, HTML::TagSet, and Digest::MD5.

All of the above are available at http://paracite.eprints.org/files/perlmods/. Although these are not guaranteed to be the most recent versions, they are the versions that ParaTools has been tested with. For the most recent releases, the Perl modules can also be found at http://www.cpan.org.

Installing Perl Modules

This describes the way to install a simple perl module, some require a bit more effort. We will use the non-existent FOO module as an example.

Unpack the archive:
 % gunzip FOO-5.23.tar.gz
 % tar xf FOO-5.23.tar
Enter the directory this creates:
 % cd FOO-5.23
Run the following commands:
 % perl Build.PL
 % ./Build
 % ./Build test
 % ./Build install

Document Converters

These programs are used by the Biblio::DocParser::Utils module to convert documents to ASCII from other formats. If you would like to add other formats, see the HOWTO later in this manual.

wvText
This is part of the wvWare package, and provides a command to convert Word documents into ASCII, as well as into other formats.

wvWare is available from: http://www.wvware.com/wvWare.html

pdftotext
This is provided with xpdf, and can convert PDF to ASCII.

Xpdf is available from: http://www.foolabs.com/xpdf/download.html

pstotext
pstotext is a program that works with GhostScript to convert PS and PDF files to ASCII.

pstotext is available from: http://www.research.compaq.com/SRC/virtualpaper/pstotext.html

GhostScript is available from: http://www.cs.wisc.edu/~ghost/

links
Links is an excellent ASCII web browser that can display complex pages with tables and frames. It also has a very effective ASCII dump option, which ParaTools::Utils uses to convert HTML to ASCII.

Links is available from: http://artax.karlin.mff.cuni.cz/~mikulas/links/

 ParaTools 1.00 Documentation - Required Software