Sather Home Page Culture Description:
File Formats

 

Introduction
For each culture to which the Required Library may be ported there are four culture-dependent description files which are produced by lcc, the cultural 'compiler'.

The textual descriptions as specified by the ISO/IEC standards 14651 - International String Ordering - Method for comparing Character Strings and Description of the Common Template Tailorable Ordering and 14652 - Information Technology - Specifications for Cultural Conventions (see Sather language specification section 2).

In addition to the three binary files produced in this way to enable a program to operate in a culture-independent manner, there needs to be various message files (using the local repertoire and encoding).

All of these files have various components which are written to the file preceded by an octet giving the count of following octets (which may, of course, be zero). This is specified in the following structure tables using the term Sized(xxx) which shall be read as indicating an object no bigger than 256 octets the first of which gives the number of following octets in the object's binary string representation. This is particularly used for indicating textual strings of arbitrary size in the target encoding.

Since each file contains many components of different object values, the structure is described in terms of these objects, which in turn may be specified of other objects until at some 'level' actual values are on the file itself.

NOTES

  1. Figures in the table columns headed 'Octets' give the count of octets containing the value. Unless a Sather Required Library class name is given in the comments column or the name includes the wording 'bit-pattern' then the entity is normally an unsigned numeric whole number. There is one exception to the unsigned number rule which is indicated by a note to the relevant table.
  2. All multiple octet values are stored with the most significant octet first on the file. In general this does not necessarily apply to sized objects which are considered to be in binary string form ready for sizing without any alteration to the binary string.

Repertoire Map File

This file contains data which enables codes to be converted to tokens and vice versa. This is needed when the tokens are used in establishing ordering weights when comparing such things as character strings. The file consists of the sections described in the three following tables in the order given.

 

Ordering Specification File

While part of the cultural specifications, this file is used in string ordering, containing as it does the weights attached for ordering purposes to particular tokens (codes). This file, therefore, contains both a large sequence of tokens with their corresponding weights as well as one or more ranges of tokens when some common set of weights can be applied to all tokens in some range. Note that the lettered size values are unique within the entire file!

 

Cultural Specification File

This last file in the group specified by the international standards contains all of the remaining cultural specification components as indicated in ISO/IEC 14652. It is the smallest of these three files, despite the complexity of data structures it contains.

Note that the Boolean values are represented by the bit-patterns 0x0 for false and 0x1 for true. If one of the 'present' components is false then the next following table entry is omitted from the file. This is indicated by the entry being written between brackets - "[" and "]".

  

Note that in the above structure the offset is a signed number formed from the given number of octets.

 

Note that there are a number of single octet logical values in the time and date structure which have the bit-pattern 0x1 if true - in which case the following entity is next. If the value is 0x0 (false) then the following entity is omitted from the file. The entity is written in square brackets to indicate this possibility.

 

Note that the days component of the elapsed time will inevitable be zero in this context!

 

Note that the alternate digits are given in sequence from the encoding for the numeric value zero upwards.

 

Culture-dependent Sather Environment Description File
This file contains the target culture character codes for a number of punctuation (and other) marks used either in parsing the cultural specification files or in generating the above binary files. They are expressed as names taken from the textual form of the repertoire map specification which appear in this file as 34 encodings. These are followed by file mode strings and file path puctuation symbols related to the run-time environment on any particular computer system. The reader is referred to the source files in SATHER_HOME/resources/lcc-Data/definitions/sather for detailed information.

 


Specification Index Resources Index
Comments or enquiries should be made to Keith Hopper.
Page last modified: Thursday, 9 March 2000.
Produced with Amaya