Re: less than 4 MB RAM / 2 bzip or not 2 bzip?

From: Alfie Costa (agcosta@gis.net)
Date: Sun May 21 2000 - 09:07:20 CEST


On 18 May 2000, at 14:50, Michele Andreoli <mulinux@sunsite.auc.dk> wrote:

> > So, starting from a P200 with lots of RAM, I rearchived the EXT and VNC addons,
> > with the '-1' switch. These were about 30K bigger than the originals, but they
> > still fit on the floppy. Made new floppies with these new .bz2 archives, and
> > then installed them on the 386 laptop. Sure enough, it worked -- not as fast
> > as PKZIP or ARJ, but only minutes to uncompress, and not several hours.
>
> MuLinux decompress with bzip2 -ds: the -s option is to reduce memory usage.

True, the decompress -s option helps, but only up to a certain limit. The
compression switches were what made the difference between hours and minutes
for my 4meg laptop. Here's how a bzip2 man page explains it:

<snip>
       Bzip_2 compresses large files in blocks. The block size
       affects both the compression ratio achieved, and the
       amount of memory needed both for compression and decom-
       pression. The flags -1 through -9 specify the block size
       to be 100,000 bytes through 900,000 bytes (the default)
       respectively. At decompression-time, the block size used
       for compression is read from the header of the compressed
       file, and bunzip2 then allocates itself just enough memory
       to decompress the file. Since block sizes are stored in
       compressed files, it follows that the flags -1 to -9 are
       irrelevant to and so ignored during decompression.

       Compression and decompression requirements, in bytes, can
       be estimated as:

             Compression: 400k + ( 7 x block size )

             Decompression: 100k + ( 4 x block size ), or
                            100k + ( 2.5 x block size )

       Larger block sizes give rapidly diminishing marginal
       returns; most of the compression comes from the first two
       or three hundred k of block size, a fact worth bearing in
       mind when using bzip2 on small machines. It is also
       important to appreciate that the decompression memory
       requirement is set at compression-time by the choice of
       block size.

       For files compressed with the default 900k block size,
       bunzip2 will require about 3700 kbytes to decompress. To
       support decompression of any file on a 4 megabyte machine,
       bunzip2 has an option to decompress using approximately
       half this amount of memory, about 2300 kbytes. Decompres-
       sion speed is also halved, so you should use this option
       only where necessary. The relevant flag is -s.

       In general, try and use the largest block size memory con-
       straints allow, since that maximises the compression
       achieved. Compression and decompression speed are virtu-
       ally unaffected by block size.

       Another significant point applies to files which fit in a
       single block -- that means most files you'd encounter
       using a large block size. The amount of real memory
       touched is proportional to the size of the file, since the
       file is smaller than a block. For example, compressing a
       file 20,000 bytes long with the flag -9 will cause the
       compressor to allocate around 6700k of memory, but only
       touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
       decompressor will allocate 3700k but only touch 100k +
       20000 * 4 = 180 kbytes.

       Here is a table which summarises the maximum memory usage
       for different block sizes. Also recorded is the total
       compressed size for 14 files of the Calgary Text Compres-
       sion Corpus totalling 3,141,622 bytes. This column gives
       some feel for how compression varies with block size.
       These figures tend to understate the advantage of larger
       block sizes for larger files, since the Corpus is domi-
       nated by smaller files.

                  Compress Decompress Decompress Corpus
           Flag usage usage -s usage Size

            -1 1100k 500k 350k 914704
            -2 1800k 900k 600k 877703
            -3 2500k 1300k 850k 860338
            -4 3200k 1700k 1100k 846899
            -5 3900k 2100k 1350k 845160
            -6 4600k 2500k 1600k 838626
            -7 5400k 2900k 1850k 834096
            -8 6000k 3300k 2100k 828642
            -9 6700k 3700k 2350k 828642

<snip>

The original mu .bz2 files were compressed with flag -9. Reading the above
chart, this means that bunzip2 -s needs 2350K to uncompress these files. More
memory than the old laptop could spare, so it spent hours swapping to disk.

I experimented with the GCC addon, which which compressed with the -1 flag was
too big to fit on a floppy. It happens that the -5 switch gives tolerable
decompressing speed on this laptop. A list of several experiments...

Size Date filename/filetype

5724160 May 19 09:27 gcc: GNU tar archive
1917731 May 19 10:04 gcc.gzip: gzip compressed data, max compression
1917813 May 19 10:03 gcc.zip: Zip archive data, at least v2.0 to extract
1853573 May 19 09:27 gcc1.bz2: bzip2 compressed data, block size = 100k
1800589 May 19 09:29 gcc2.bz2: bzip2 compressed data, block size = 200k
1781732 May 19 08:24 gcc3.bz2: bzip2 compressed data, block size = 300k
1778375 May 19 09:22 gcc4.bz2: bzip2 compressed data, block size = 400k
1753495 May 19 09:25 gcc5.bz2: bzip2 compressed data, block size = 500k
1752332 Jul 25 1999 gcc.tgz: bzip2 compressed data, block size = 900k

The smallest file is the original 900K block size bzip2 file with the .tgz
extension. The 500k block .bz2 is close, it's only 1K bigger, so it fits on a
floppy, but only needs 1350K RAM to decompress. Surprisingly, the laptop was
happy with this '-5' file. The other archives are too big to fit in 1722K.
(1722K = 1763328 bytes)

If this sort of disk/RAM compromise is worthwhile, perhaps the script that
makes mu's disks might benefit from an optional function that tries to create
the smallest bzip2 block size that will fit on one disk. There would be a loop
that tries bzip2 switches -1 through -9, (or maybe -5 through -9?), and
compares the filesize to a limiting value. It would mean bigger archives, but
they'd work better with smaller machines. Almost a paradox.

Whatever the case for mu's archives, maybe this info can help other low-RAM
muLinux enthusiasts.

---------------------------------------------------------------------
To unsubscribe, e-mail: mulinux-unsubscribe@sunsite.auc.dk
For additional commands, e-mail: mulinux-help@sunsite.auc.dk



This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:14 CET