mu TECH: a new uglified 'hexd'

From: Alfie Costa (agcosta@gis.net)
Date: Wed Apr 26 2000 - 01:10:22 CEST


Hello mu-Coders,

Attached is a 'hexd.c' which works with pipes. It is faster and can perform
new tricks. It may compile a little bigger. The code includes a few gratuitous
experiments. There may be too many comments in the source code, or the wrong
kinds of comments. It's all based on Michele's old 'hexd.c', now barely
recognizable.

What it does:

1) It does what the old 'hexd' does. This works...

hexd -c < foo.v1 | hexd -d > foo.v2

...so that 'foo.v1' and 'foo.v2' will be identical. The old 'hexd' problem
seemed to be related to using the 'read()' function in the routine that 'hexd -
d' calls. Changing this routine so it uses 'fgets()' seems to be what fixes
it. It's still a little bit of a mystery, but it works.

2) Two new switches, "-t" (for Text) and "-x" (for heX). "-t" outputs a text
dump, without hex code or linefeeds; if a particular byte is not a printable
character, a '.' replaces it. "-x" outputs a hex dump, no text, no line
numbers, and no spaces nor linefeeds.

3) The input routine called by 'hexd -d' magically understands the output of
'hexd -x'. So this works:

hexd -x < foo.v1 | hexd -d > foo.v2

4) 'hexd -d' is now flexible about user input and spacing and things like that.
'hexd -d' used to ignore the first 8 bytes of every line, (the line numbers of
the hex dump), and the hex code had to be spaced in 3-byte groups like "0d 0a
12 15...". Variations like "d a 1215" or "0d0a1215" or "d, a, 12, 15" would
not work, (or at least the output would be different) -- those variant inputs
should work now, producing identical output.

'hexd -d' also works without any linefeeds, but in such cases it is possible to
confuse it, unless the spacing of the input is 100% regular. Spacings of 2, 3,
4, and multiples thereof are good. More comments in the code attempt to
explain this.

Otherwise, when using linefeeds, the input can be much more chaotic.

5) Now it can code/decode files bigger than one megabyte. The limit should be
the maximum size of an 'unsigned int'. I tried it on a 3 meg file, and that
worked, although big files process too slowly to be much fun with this code.

Why:

For 'ash' 'case...esac' switches that reliably accept binary data, to be used
with another 'file' improvement. The idea would be to use 'dd' to pipe some
data to 'hexd -t', and assign that to a variable, which is fed to
'case...esac'. I hope to post another faster 'file' script soon...

How:

The attached C code was a bit of a learning experience for me, so parts are
eccentric...

1) Some of the logic may be involved, but I have tried (a little) not to write
obfuscated code. For instance, where it would be be possible to write:

            buf[0] = *i++;

This is spread out to:

            buf[0] = *i;
            i++;

Both variants compile to the same object code. It's not like with 'ash'
scripts, where the shorter code is almost always interpreted faster.

Less pleasantly, I replaced these readable lines:

   for (j=n; j<R; j++ ) fprintf(stdout,"%2c ",' ');
   fprintf(stdout,"[");

...with:

    printf( "%*s", ( ((COLUMNS*3)+1) - (n*3) ) , "[" );

...which is faster, but looks messy. The two '*3's there could be combined
into one, but the way it's typed should let more work be done at compile-time
and less work at run-time. In 'gcc' however it makes no difference -- 'gcc'
does a lot of optimizing.

2) Pointers were used, which might be harder to read. My first attempt used
array notation and so it was easier to read, but using pointers was faster.
They aren't necessary though.

3) I re-used the buffer that stores the help. It took up 300+ bytes and was
ideal as a big line buffer. There are some fancy compiler directives to make
best use of the space. This recycling wasn't necessary, but it seemed like an
interesting experiment, and it works.

4) All the subroutines have been renamed to names that might or might not be
easier to understand. The code indenting is more "3D", but it isn't very
consistent.

5) Used some 'goto's in one routine because nothing else was as fast. Not that
many though, and the labels are all near each other, so a reader can easily see
what's going on in 8 consecutive lines of code.

6) Maybe too many redundant functions. C is sometimes annoying because its
standard function library isn't well factored; for example, 'printf' is sort of
like 'puts', and it's often hard to know which function compiles smaller or
faster. I added some 'putchar()'s, and these made the object file bigger with
every use, because they're macros, or some reason like that. Such a basic
function, yet using it is a real trade-off. Whereas a big swiss-army-knife
like 'printf' isn't a macro, but probably slows the code because of all the
options it has.

Problems or questions:

1) I don't know how to get 'gcc' to compile things small. With no switches,
'gcc' produces a 6K 'hexd' for the original version, yet mu's 'hexd' is only
3.6K. So the compiled version is not included.

2) Maybe it would have been better to make separate simple decoding routines
for the "-x" and "-c" outputs, instead of having a single complicated routine
do both jobs. The separate routines might be called with different switches.
This approaches one of those political questions of "usage vs. programming
complexity". This code leans more towards the 'harder to code but easier to
use camp'.

3) I can't think of many practical uses for decoding that '-x' output; it just
seemed like a good idea. Perhaps another switch that does what '-x' does, but
also inserts 80-column linefeeds or sticks in a space every third character
would make it more useful.

Well, that's it for now...

The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any another MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

   ---- File information -----------
     File: hexd.c.gz
     Date: 25 Apr 2000, 3:06
     Size: 3002 bytes.
     Type: Unknown


---------------------------------------------------------------------
To unsubscribe, e-mail: mulinux-unsubscribe@sunsite.auc.dk
For additional commands, e-mail: mulinux-help@sunsite.auc.dk




This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:14 CET