From: Alfie Costa (agcosta@gis.net)
Date: Mon Feb 28 2000 - 19:30:22 CET
On 23 Feb 00, at 1:37, Alfie Costa <mulinux@sunsite.auc.dk> wrote:
> Ugly kludges in progress...
A new less ugly and debugged /bin/file is attached to this message.  Based on 
timing how long "file *" takes in various directories, it's around 2-10 times 
faster than the original rustic 'file'.  Some notes...
It's got a "-d" switch, which with good luck shouldn't be necessary.  To peruse 
the debug output, try:
file -d foo 2> debug.txt
less debug.txt
The search order, for which magic numbers or strings to try first, has been 
changed.  This version of 'file' looks for script files first, because those 
and text files are what Midnight Commander's F3 is probably most used for.  
Checking for plain text files is still slowest of all though, which can 
probably be improved.
Same as last time, no temporary files are written.
TextString() does its compares using variables.  At first it might seem as if 
this might cause trouble in cases like this:
A="teststring"
B="test\0\0string"  # the \0's are supposed to be nulls.
/bin/ash would remove the nulls and say these strings are the same.  However, 
since 'dd' is used to get the data, and 'dd' uses a count of how many bytes to 
get, what would really happen is:
A="teststring"
B="test\0\0stri"  # note the "ng" is missing
Then /bin/ash would shorten the original 12 byte 'B' to the 10-byte"teststri", 
which is not the same as 'A'.  So it really is safe to compare text strings.
TestOctal().  The trick to compare binary data is to do it one byte at a time, 
and to add some extra character to the variable so that null strings don't 
cause any trouble.  Example:
null=\\000		# set the variable null to '\000'  (octal)
foo=.`echo -e $null`	# set foo to a period, followed by whatever ASCII
                        # char was in null.
koo="$foo"		# duplicate foo for a demo
Test "$foo" = "$koo"	# compare them
echo $?			# outputs a zero if true.
Other than nulls, variables seem to be able to hold the other 255 possible 
characters just fine.
Spaces... Spaces are disgustingly important when assigning variables and 
comparing them.  Unlike some other computer languages, if the spacing isn't 
perfect, things go wrong.  This hasn't even anything to do with quotes.  
Examples:
foo=5			# good, no spaces when assigning a variable.
Test $foo=5		# bad, this won't work...
echo $?			# not what it should be.
Test $foo = 5	        # good, Test and [ ... ] must have spaces.
echo $?			# OK now.
A surprising case:
Test 6=7		# bad, no spaces
echo $?
Test 6 = 7		# OK
echo $?
Hope this is useful...
#!/bin/ash
# rustic `file` (by M. Andreoli)
# [ with dd
# (2/28/00 provincial gentrification by A. Costa)
#Syntax
opt=$1
case Z$opt in
Z-d) set -x;shift;;	# debug mode...
Z-h|Z) echo "Usage (mu-file): file [files]" ; exit ;;
*)
esac
# Functions
# compare with string
TestString()
{
f=$1
offset=$2; n=$3 ; string1="$4" 
string2=`dd if=$f skip=$offset bs=1c count=$n 2>/dev/null`
test "$string1" = "$string2"
}
# compare with octal \\0x \\0y \\0z ...
TestOctal()
{
f=$1
offset=$2; n=$3
shift 3
for code in $@
do
    p=.`echo -e $code`		# note the leading period to tame the nulls
    m=.`dd if=$f skip=$offset bs=1c count=1 2>/dev/null`
    test "$m" != "$p" && return 1
    offset=`expr $offset + 1`
done
}
for f in "$@"
do
# special
[ -d "$f" ] && echo "$f: directory" && continue
[ -L "$f" ] && echo "$f: symbolic link" && continue
[ ! -f "$f" ] && echo "$f: not existent" && continue
# script
if TestString $f 0 1 '#'	# be lazy
then
    TestString $f 0  9 '#!/bin/sh' && echo "$f: Bourne shell script text" && continue
    TestString $f 0 10 '#!/bin/ash' && echo "$f: ash script text" && continue
    TestString $f 0 11 '#!/bin/bash' \
    && echo "$f: Bash shell script text" && continue
    TestString $f 0 10 '#!/bin/csh' && echo "$f: C shell script text" && continue
    TestString $f 0 10 '#!/bin/ksh' && echo "$f: Korn shell script text" && continue
    TestString $f 0 11 '#!/bin/perl' && echo "$f: perl command text" && continue
    TestString $f 0  7 '#!/bin/' && echo "$f: script text" && continue
fi
# Linux 
TestString $f 1 3 ELF \
&& echo -e "$f: ELF executable" && continue
TestString $f 1080 1 'S' \
&& echo "$f: Linux/i386 ext2 filesystem [probable :(]" && continue
TestString $f 4086 10 'SWAP-SPACE' \
&& echo "$f: Linux/i386 swap file" && continue
# compressed
TestString $f 257 5 ustar && echo "$f: TAR archive" && continue
TestOctal $f 0 2  \\037 \\0213 && echo "$f: gzip compress data" && continue
TestString $f 0 2 BZ && echo "$f: bzip compressed data" && continue
TestString $f 0 2 PK && echo "$f: Zip archive data" && continue
TestString $f 0 4 'Rar!' && echo "$f: RAR archive data" && continue
# text
TestString $f 0 5 '%PDF-' && echo "$f: PDF document" && continue
TestString $f 0 2 '%!' \
&& echo "$f: PostScript document text" && continue
TestOctal $f 0 2 \\0367 \002 && echo "$f: TeX DVI file" && continue
# Audio
TestString $f 0 4 MThd  && echo "$f: Standard MIDI data" && continue
if TestString $f 0 4 RIFF 
then
    echo -n "$f: Microsoft RIFF"
    if TestString $f 8 4 WAVE 
    then
        echo ", WAVE audio data" 
    else
        echo
    fi
    continue
fi
# image
TestOctal $f 0 2 \\0377 \\0330 && echo "$f: JPEG image data" && continue 
TestString $f 0 4 GIF8 && echo "$f: GIF image data" && continue
TestString $f 0 2 BM && echo "$f: PC bitmap data" && continue
TestOctal $f 0 4 \\0115 \\0115 \\0 \\052 \
&& echo "$f: TIFF image data, big-endian" && continue
TestOctal $f 0 4 \\0111 \\0111 \\052 \\0 \
&& echo "$f: TIFF image data, little-endian" && continue
# binary
TestString $f 0 2 'MZ' && echo "$f: MS-DOS executable (EXE)" && continue
# HP48
TestString $f 0 7 'HPHP48-' && echo "$f: HP48 binary" && continue
TestString $f 0 5 '%%HP:' && echo "$f: HP48 text" && continue
echo "$f: ASCII text or data"
done
---------------------------------------------------------------------
To unsubscribe, e-mail: mulinux-unsubscribe@sunsite.auc.dk
For additional commands, e-mail: mulinux-help@sunsite.auc.dk
This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:13 CET