 |
Section 5:
The Lexis
|
|
This specification of the lexis for the Sather programming language uses
the notation defined in BS 6154
Method of defining Syntactic Meta-Language.
A Sather source file text consists of lexical tokens separated by optional
white space where necessary to differentiate one token from another. Some of
the tokens are specified here in terms of a symbol - "xxx_SY" which may be any culturally defined sequence of one or
more character encodings, or, in the case of identifiers and comments,
sequences of one or more culturally defined character encodings the meaning of
which is defined by the programmer.
The representation forms of the xxx_SY
tokens for the purposes of textual illustration are given by the strings
defined in the separate Reference specification in Annex D - which does not therefore
form a necessary part of this lexis - it is given merely for use in example
source text as may be defined throughout this document.
5.1 Source Text
NOTE |
A token is the longest sequence of encodings which satisfy the
definitions below. As a consequence of this, separators (see
sub-section 5.8) must be provided after a word
token or a literal. Additionally literals must be separated from
other literals where adjacent literals are permissible. |
5.2 Word Tokens
5.3 Identifiers
NOTE |
The
Cluster_Count_SY is retained in the above table for the
current language definition. There is a proposal that it
should be deleted in the next language revision. |
|
There are three forms of 'name' in the source text of a Sather program - the
name of an iter, the name of an abstract class/type and any other programmer
defined identifier being either the name of an iter method or of an abstract
class/type.
NOTE |
Since there are many world scripts which neither have cases nor even
letters, it is impractical to require that class/type names should be
all upper case letters. This is merely a matter of programming style
where applicable and can have no significance in a program conforming
to this specification. |
5.5 Keywords
NOTES |
1. A number of words in previous lists of keywords were actually
value expressions, etc and have been removed to appropriate other
places in this document. |
|
2. The keyword shown in red in the definition above is a
proposed addition to the language to enable named libraries to be
introduced. |
5.6 Symbols and Operators
Sather requires a number of 'punctuation marks' which are either symbols
needed in parsing of the form of punctuation or binary or unary operators
which are required to be mapped to associated method calls.
5.7 Constant Literals
The Sather language permits the expression of four kinds of literal value in
the source text of a Sather class.
5.7.1 Bit Literals
5.7.2 Boolean Literals
5.7.3 Void Literal
5.7.4 Numeric Literals
NOTE |
There are culture scripts for which no Digit_Zero_SY exists - hence
such cultures cannot represent a base for numeric value literals. This
point must be addressed when revising the language specification! |
5.7.5 Text Literals
In the two production rules for text literal and for character literal given
below, the opening text quote mark and the closing text quote mark must be the
same character in each application of the rule.
5.8 Separators
5.8.1 White Space
5.9 Culture Dependent Definitions
This section of the lexis specifies characters within specific character
groups as specified by a Local Culture Specification (LCS) made in accordance
with
ISO/IEC 14652. Because of
this it is defined in terms of natural language strings below.
Comments
or enquiries should be made to Keith Hopper.
Page last modified: Wednesday, 17 May
2000. |
 |