NAME

Glynx - a download manager.


DESCRIPTION

Glynx makes a local image of a selected part of the internet.

It can be used to make download lists to be used with other download managers, making a distributed download process.

It currently supports resume/retry, referer, user-agent, frames, distributed download (see --slave, --stop, --restart).

It partially supports: redirect (using file-copy), java, javascript, multimedia, authentication (only basic), mirror, translating links to local computer (--makerel), correcting file extensions, ftp, renaming too long filenames and too deep directories, cookies, proxy, forms.

A very basic cgi user interface is included.

No testing so far: ``https:''.

Tested on Linux and NT


SYNOPSIS

Do-everything at once:

 glynx.pl [options] <URL>
Save work to finish later:

 glynx.pl [options] --dump="dump-file" <URL>
Finish saved download:

 glynx.pl [options] "download-list-file"
Network mode (client/slave)
- Clients:

 glynx.pl [options] --dump="dump-file" <URL>
- Slaves (will wait until there is something to do):

 glynx.pl [options] --slave


HINTS

How to create a default configuration:

        Start the program with all command-line configurations, plus --cfg-save
        or:
        1 - start the program with --cfg-save
        2 - edit glynx.ini file

--subst, --exclude and --loop use regular expressions.

   http://www.site.com/old.htm --subst=s/old/new/
   downloads: http://www.acme.com/new.htm

   - Note: the substitution string MUST be made of "valid URL" characters

   --exclude=/\.gif/
   will not download ".gif" files

   - Note: Multiple --exclude are allowed:

   --exclude=/gif/  --exclude=/jpeg/
   will not download ".gif" or ".jpeg" files

   It can also be written as:
   --exclude=/\.gif|\.jp.?g/i
   matching .gif, .GIF, .jpg, .jpeg, .JPG, .JPEG

   --exclude=/www\.site\.com/
   will not download links containing the site name

   http://www.site.com/bin/index.htm --prefix=http://www.site.com/bin/
   won't download outside from directory "/bin". Prefix must end with a slash "/".

   http://www.site.com/index%%%.htm --loop=%%%:0..3
   will download:
     http://www.site.com/index0.htm
     http://www.site.com/index1.htm
     http://www.site.com/index2.htm
     http://www.site.com/index3.htm

   - Note: the substitution string MUST be made of "valid URL" characters

- For multiple exclusion: use ``|''.

- Don't read directory-index:

        ?D=D ?D=A ?S=D ?S=A ?M=D ?M=A ?N=D ?N=A =>  \?[DSMN]=[AD] 

        To change default "exclude" pattern - put it in the configuration file

Note: ``File:'' item in dump file is ignored

You can filter the processing of a dump file using --prefix, --exclude, --subst

If after finishing downloading you still have ``.PART._BUSY_'' files in the base directory, rename them to ``.PART'' (the program should do this by itself)

Don't do this: --depth=1 --out-depth=3 because ``out-depth'' is an upper limit; it is tested after depth is generated. The right way is: --depth=4 --out-depth=3

This will do nothing:

 --dump=x graphic.gif

because the dump file gets all binary files.

Errors using https:

 [ ERROR 501 Protocol scheme 'https' is not supported => LATER ] or
 [ ERROR 501 Can't locate object method "new" via package "LWP::Protocol::https" => LATER ]

This means you need to install at least ``openssl'' (http://www.openssl.org), Net::SSLeay and IO::Socket::SSL


COMMAND-LINE OPTIONS

Check --help for default values.

Very basic:

  --version         Print version number and quit
  --verbose         More output
  --quiet           No output
  --help            Help page
  --cfg-save        Save configuration to file
  --base-dir=DIR    Place to load/save files

Download options are:

  --sleep=SECS      Sleep between gets, ie. go slowly
  --prefix=PREFIX   Limit URLs to those which begin with PREFIX
                    Multiple "--prefix" are allowed.
  --depth=N         Maximum depth to traverse
  --out-depth=N     Maximum depth to traverse outside of PREFIX
  --referer=URI     Set initial referer header
  --limit=N         A limit on the number documents to get
  --retry=N         Maximum number of retrys
  --timeout=SECS    Timeout value - increases on retrys
  --agent=AGENT     User agent name
  --mirror          Checks all existing files for updates
  --nomirror        Do not check for updates -- if file exists, it's ok
  --mediaext        Creates a file link, guessing the media type extension (.jpg, .gif)
                    (perl actually makes a file copy)
  --nomediaext      Do not try to change media type extension
  --makerel         Make Relative links. Links in pages will work in the
                    local computer.
  --nomakerel       Keep links as they are. Do not try to change links.
  --auth=USER:PASS  Set authentication credentials
  --cookies=FILE    Set up a cookies file (default is no cookies)
  --name-len-max    Limit filename size
  --dir-depth-max   Limit directory depth

Multi-process control:

  --slave           Wait until a download-list file is created (be a slave)
  --stop            Stop slave
  --restart         Stop and restart slave

Not implemented yet but won't generate fatal errors (compatibility with lwp-rget):

  --hier            Download into hierarchy (not all files into cwd)
  --iis             Workaround IIS 2.0 bug by sending "Accept: */*" MIME
                    header; translates backslashes (\) to forward slashes (/)
  --keepext=type    Keep file extension for MIME types (comma-separated list)
  --nospace         Translate spaces URLs (not #fragments) to underscores (_)
  --tolower         Translate all URLs to lowercase (useful with IIS servers)

Other options: (to-be better explained)

  --indexfile=FILE  Index file in a directory
  --part-suffix=.SUFFIX  Extension to use for partial downloads 
                    (example: ".Getright" ".PART")
  --dump=FILE       make download-list file, to be used later
  --dump-max=N      number of links per download-list file 
  --invalid-char=C  Character to use in substitutions for invalid characters
  --exclude=/REGEXP/i  Don't download matching URLs
                    Multiple --exclude are allowed
  --loop=REGEXP:INITIAL..FINAL  Expand a URL through substitutions 
                    (example: xx:a,b,c  xx:'01'..'10')
  --subst=s/REGEXP/VALUE/i  Substitute some string in the urls.
  --404-retry       will retry on error 404 Not Found. 
  --no404-retry     creates an empty file on error 404 Not Found.


COPYRIGHT

Copyright (c) 2000 Flavio Glock <fglock@pucrs.br> All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This program was based on examples in the Perl distribution.

If you use it/like it, send a postcard to the author.