RFC: Multilanguage in muLinux (long)

From: Michele Andreoli (m.andreoli@tin.it)
Date: Tue Mar 27 2001 - 16:22:47 CEST


Hello international friends,

I'm implementing multi-language in muLinux, because I found it very
pedagogical to realize. There the outlines I'm using. I think this is
smart solution but, anyway, any comment from you is highly appreciated.

The simple idea is to substitute any "echo" and "cat" command in Setup,
wizards, boot scripts, etc, with a single command I called "tell"
(because "say" it is already used as synth command).
I developed "tell" as simple AWK command, using the hashes, i.e. array
with generalized indexing.

You can use "tell" everywhere you would use "echo".

Examples:

                tell "The month is March"
         
or
                tell <<END
                The month is March
                I like the penguin
                END

How it works?
--------------
It simply seach a corrispondence in a DB ( a file ), using hashes.
If no corrispondence is found, it simply output the fractured english
in input. This feature grants that every script still will works also
without translation.

DB format
----------

Very simple: it use escape code as AWK separators. This is the
template:

        fractured-english sentence
        ^A
        translated sentence
        ^B
                
ad libidum, .... Multilines are allowed, because record separator in
^B ( \002 octal) and field separator is ^A ( \001 octal)

We have here an noticeable amount of redondance, because the the
english sentence is doubled: in the original script, in the DB.

But the advantage is insuperable: we can cooperate in translation,
starting from a simple identical file. Sentences aren't numbered, etc.

DBs are simple files. it, en, de, fr, etc ..., stored in some
directory.

"tell" really learn sentences
----------------------------

When I run a script containing the command "tell", any new sentence
is cached in the DB. The "translated sentence" is, at beginning, simply
the keyword "-0-".
When I finished running every script or setup, the DB is filled with
these values. So, I can now open the DB and replace the "-0-" keyword with
translation.

I proceed with interate approximations, because at the few first run, the
language seems more fractured of normal, with a very pictoresque mixture
of italian and "fractured english"

Here-documents and string substitution
---------------------------------------

What happen with code fragment as:

                echo "The NIC card is $CARD" ?

Because the simpe replacement:

                tell "The NIC card is $CARD" ?

is very bad (it fill the DB with different sentences, one per CARD),
it is replaced with:

                tell "The NIC card is %s" | printf -r $CARD

so we have to translate one sentence: "The NIC card is %s".

yes: I hacked a bit the "printf" command. Now, using the option -r,
it can read the C-language string format from standard input.

The trick covers also this case:

        tell <<END | printf -r $IO $IRQ $CARD `date`
        The IO is %s,
        the IRQ is %s and the card is %s
        Current time is %s
        END

Conclusions
------------

The only big disadvantage I can see a this moment is that: if I
change a single character in some string in Setup, the "tell" will
fail extracting the corrispondence, and it simply outputs the new
fractured sentence.

We need of fuzzy logic, separated name spaces, etc in order to do
a really good work, or muLinux starts to speak an own kind
of experanto babele-language!

 
Michele

-- 
In summing up, I wish I had some kind of affirmative message to leave
you with, I don't. Would you take two negative messages? - Woody Allen
---------------------------------------------------------------------
To unsubscribe, e-mail: mulinux-unsubscribe@sunsite.dk
For additional commands, e-mail: mulinux-help@sunsite.dk


This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:18 CET