From: Alfie Costa (agcosta@gis.net)
Date: Sun May 21 2000 - 09:07:20 CEST
On 18 May 2000, at 14:50, Michele Andreoli <mulinux@sunsite.auc.dk> wrote:
> > So, starting from a P200 with lots of RAM, I rearchived the EXT and VNC addons, 
> > with the '-1' switch.  These were about 30K bigger than the originals, but they 
> > still fit on the floppy.  Made new floppies with these new .bz2 archives, and 
> > then installed them on the 386 laptop.  Sure enough, it worked -- not as fast 
> > as PKZIP or ARJ, but only minutes to uncompress, and not several hours.  
> 
> MuLinux decompress with bzip2 -ds: the -s option is to reduce memory usage.
True, the decompress -s option helps, but only up to a certain limit.  The 
compression switches were what made the difference between hours and minutes 
for my 4meg laptop.  Here's how a bzip2 man page explains it:
<snip>
       Bzip_2 compresses large files in blocks.   The  block  size
       affects  both  the  compression  ratio  achieved,  and the
       amount of memory needed both for  compression  and  decom-
       pression.   The flags -1 through -9 specify the block size
       to be 100,000 bytes through 900,000  bytes  (the  default)
       respectively.   At decompression-time, the block size used
       for compression is read from the header of the  compressed
       file, and bunzip2 then allocates itself just enough memory
       to decompress the file.  Since block sizes are  stored  in
       compressed  files,  it follows that the flags -1 to -9 are
       irrelevant  to  and  so  ignored   during   decompression.
       Compression  and decompression requirements, in bytes, can
       be estimated as:
             Compression:   400k + ( 7 x block size )
             Decompression: 100k + ( 4 x block size ), or
                            100k + ( 2.5 x block size )
       Larger  block  sizes  give  rapidly  diminishing  marginal
       returns;  most of the compression comes from the first two
       or three hundred k of block size, a fact worth bearing  in
       mind  when  using  bzip2  on  small  machines.  It is also
       important to  appreciate  that  the  decompression  memory
       requirement  is  set  at compression-time by the choice of
       block size.
       For files compressed with the  default  900k  block  size,
       bunzip2  will require about 3700 kbytes to decompress.  To
       support decompression of any file on a 4 megabyte machine,
       bunzip2  has  an  option to decompress using approximately
       half this amount of memory, about 2300 kbytes.  Decompres-
       sion  speed  is also halved, so you should use this option
       only where necessary.  The relevant flag is -s.
       In general, try and use the largest block size memory con-
       straints  allow,  since  that  maximises  the  compression
       achieved.  Compression and decompression speed are  virtu-
       ally unaffected by block size.
       Another  significant point applies to files which fit in a
       single block -- that  means  most  files  you'd  encounter
       using  a  large  block  size.   The  amount of real memory
       touched is proportional to the size of the file, since the
       file  is smaller than a block.  For example, compressing a
       file 20,000 bytes long with the flag  -9  will  cause  the
       compressor  to  allocate  around 6700k of memory, but only
       touch 400k + 20000 * 7 = 540 kbytes of it.  Similarly, the
       decompressor  will  allocate  3700k  but only touch 100k +
       20000 * 4 = 180 kbytes.
       Here is a table which summarises the maximum memory  usage
       for  different  block  sizes.   Also recorded is the total
       compressed size for 14 files of the Calgary Text  Compres-
       sion  Corpus totalling 3,141,622 bytes.  This column gives
       some feel for how  compression  varies  with  block  size.
       These  figures  tend to understate the advantage of larger
       block sizes for larger files, since the  Corpus  is  domi-
       nated by smaller files.
                  Compress   Decompress   Decompress   Corpus
           Flag     usage      usage       -s usage     Size
            -1      1100k       500k         350k      914704
            -2      1800k       900k         600k      877703
            -3      2500k      1300k         850k      860338
            -4      3200k      1700k        1100k      846899
            -5      3900k      2100k        1350k      845160
            -6      4600k      2500k        1600k      838626
            -7      5400k      2900k        1850k      834096
            -8      6000k      3300k        2100k      828642
            -9      6700k      3700k        2350k      828642
<snip>
The original mu .bz2 files were compressed with flag -9.  Reading the above 
chart, this means that bunzip2 -s needs 2350K to uncompress these files.  More 
memory than the old laptop could spare, so it spent hours swapping to disk.
I experimented with the GCC addon, which which compressed with the -1 flag was 
too big to fit on a floppy.  It happens that the -5 switch gives tolerable 
decompressing speed on this laptop.  A list of several experiments...
Size    Date         filename/filetype
5724160 May 19 09:27 gcc:      GNU tar archive
1917731 May 19 10:04 gcc.gzip: gzip compressed data, max compression
1917813 May 19 10:03 gcc.zip:  Zip archive data, at least v2.0 to extract
1853573 May 19 09:27 gcc1.bz2: bzip2 compressed data, block size = 100k
1800589 May 19 09:29 gcc2.bz2: bzip2 compressed data, block size = 200k
1781732 May 19 08:24 gcc3.bz2: bzip2 compressed data, block size = 300k
1778375 May 19 09:22 gcc4.bz2: bzip2 compressed data, block size = 400k
1753495 May 19 09:25 gcc5.bz2: bzip2 compressed data, block size = 500k
1752332 Jul 25 1999  gcc.tgz:  bzip2 compressed data, block size = 900k
The smallest file is the original 900K block size bzip2 file with the .tgz 
extension.  The 500k block .bz2 is close, it's only 1K bigger, so it fits on a 
floppy, but only needs 1350K RAM to decompress.  Surprisingly, the laptop was 
happy with this '-5' file.  The other archives are too big to fit in 1722K.  
(1722K = 1763328 bytes)
If this sort of disk/RAM compromise is worthwhile, perhaps the script that 
makes mu's disks might benefit from an optional function that tries to create 
the smallest bzip2 block size that will fit on one disk.  There would be a loop 
that tries bzip2 switches -1 through -9, (or maybe -5 through -9?), and 
compares the filesize to a limiting value.  It would mean bigger archives, but 
they'd work better with smaller machines.  Almost a paradox.
Whatever the case for mu's archives, maybe this info can help other low-RAM 
muLinux enthusiasts.
---------------------------------------------------------------------
To unsubscribe, e-mail: mulinux-unsubscribe@sunsite.auc.dk
For additional commands, e-mail: mulinux-help@sunsite.auc.dk
This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:14 CET