The Sather Language: Efficient, Interactive, Object-Oriented Programming

The Sather Language: Efficient, Interactive, Object-Oriented Programming
Stephen Omohundro
Dr. Dobbs Journal

Introduction

Sather is an object oriented language which aims to be simple, efficient, interactive, safe, and non-proprietary. One way of placing it in the "space of languages" is to say that it aims to be as efficient as C, C++, or Fortran, as elegant and safe as Eiffel or CLU, and to support interactive programming and higher-order functions as well as Common Lisp, Scheme, or Smalltalk.

Sather has parameterized classes, object-oriented dispatch, statically-checked strong typing, separate implementation and type inheritance, multiple inheritance, garbage collection, iteration abstraction, higher-order routines and iters, exception handling, constructors for arbitrary data structures, and assertions, preconditions, postconditions, and class invariants. This article describes the first few of these features. The development environment integrates an interpreter, a debugger, and a compiler. Sather programs can be compiled into portable C code and can efficiently link with C object files. Sather has a very unrestrictive license which allows its use in proprietary projects but encourages contribution to the public library.

The original 0.2 version of the Sather compiler and tools was made available in June 1991. This article describes version 1.0. By the time the article appears, the combined 1.0 compiler/interpreter/debugger should be available on "ftp.icsi.berkeley.edu" and the newsgroup "comp.lang.sather" should be activated for discussion.

Code Reuse

The primary benefit promised by object-oriented languages is reuse of code. Sather programs consist of collections of modules called "classes" which encapsulate well-defined abstractions. If the abstractions are chosen carefully, they can be used over and over in a variety of different situations.

An obvious benefit of reuse is that less new code needs to be written. Another important benefit is that reusable code is usually better written, more reliable and easier to debug. This is because programmers are willing to put more care and thought into writing and debugging code which will be used in many projects. In a good object-oriented environment, programming should feel like plugging together pre-fabricated components. Most bugs occur in the 10 percent or so of newly written code and not in the 90 percent which consists of well-tested library classes. This usually leads to much simpler debugging and far greater reliability.

Why don't traditional subroutine libraries give the same benefits? Subroutine libraries make it easy for newly written code to make calls on existing code but do not make it easy for existing code to make calls on new code. Consider a visualization package that displays data on a certain kind of display by calling display interface routines. Later the decision is made that the package should work with a new kind of display. In traditional languages, there is no simple way to get the previously written visualization routines to make calls on the new display interface. This problem is especially severe if the choice of display interface must be made at runtime.

Sather provides two primary ways for existing code to call newly written code. "Parameterized classes" allow the binding to be made at compile time and "object-oriented dispatch" allows the choice to be made at runtime. We demonstrate these two mechanisms using simple classes for stacks and polygons.

Parameterized Classes

Listing 1 shows a class which implements a stack abstraction. We want stacks of characters, strings, polygons, etc., but we don't want to write new versions for each type of element. "STACK{T}" is a "parameterized class" in which the parameter "T" specifies the stack element type. When the class is used, the type parameter is specified.

For example, the class FOO in the listing defines a routine which uses both a stack of characters and a stack of strings. The type specifier "STACK{CHAR}" causes the compiler to generate code with the type parameter "T" replaced by "CHAR". The specifier "STACK{STR}" similarly causes code to be generated based on "STR". Since character objects are usually 8 bits and strings are represented by pointers, the two kinds of stack will have different layouts in memory. The same Sather source code is reused to generate different object code for the two types. We may define a new type (eg. triple length integers) and immediately use stacks of elements of that type without modifying the STACK class. Using parameterized classes adds no extra runtime cost, but the choice of type parameter values must be made at compile time.

class STACK{T} is -- Stacks of elements of type T. attr s:ARR{T}; -- An array containing the elements. attr size:INT; -- The current insertion location. is_empty:BOOL is -- True if the stack is empty. res := (s=void or size=0) end; pop:T is -- Return the top element and remove it. Void if empty. if is_empty then res:=void else size:=size-1; res:=s[size]; s[size]:=void end end; push(T) is -- Push arg onto the stack. if s=void then s:=#ARR{T}(asize:=5) elsif size=s.asize then double_size end; s[size]:=arg; size:=size+1 end; private double_size is -- Double the size of `s'. ns::=#ARR{T}(asize:=2*s.asize); ns.copy_from(s); s:=ns end; clear is -- Empty the stack. size:=0; s.clear end end; -- class STACK{T} class FOO is bar is s1:STACK{CHAR}; s1.push('a'); s2:STACK{STR}; s2.push("This is a string.") end; end; Listing 1.

Object Oriented Dispatch

Listing 2 shows an example of object-oriented dispatch. The class "$POLYGON" is an "abstract" class. This means that it represents a set of possible object types called its "descendants" (in this case "TRIANGLE" and "SQUARE"). Abstract classes define "abstract interfaces" which must be implemented by all their descendants. Here we have only shown the single routine "number_of_vertices:INT" which returns the number of vertices of a polygon. TRIANGLE's implementation of this returns the value "3" and SQUARE's returns the value "4".

Routines in the interface of an abstract type may be called on variables declared by that type. The actual code that is called, however, is determined at runtime by the type of the object which is held by the variable. The class FOO2 defines a routine with a local variable of type "STACK{$POLYGON}". Both TRIANGLE and SQUARE objects can be pushed onto stacks of this type. The call "s.pop" might return either a triangle or a square. The call "s.pop.number_of_vertices" calls either the "number_of_vertices" routine defined by TRIANGLE and returns 3 or the "number_of_vertices" routine defined by SQUARE and returns 4. The choice is made according to the runtime type of the popped object. The names of abstract types begin with a dollar sign to help distinguish them (calls on abstract types are slightly more expensive than non-dispatched calls).

abstract class $POLYGON is ... number_of_vertices:INT; end; class TRIANGLE is inherit $POLYGON; ... number_of_vertices:INT is res:=3 end; end; class SQUARE is inherit $POLYGON; ... number_of_vertices:INT is res:=4 end; end; class FOO2 is bar2 is s:STACK{$POLYGON}; ... n:=s.pop.number_of_vertices; ... end; end; Listing 2

Strong Typing

The Sather type system is a major factor in the computational efficiency, clarity, and safety of Sather programs. It also has a big effect on the "feel" of Sather programming. Many object oriented languages have only weak typing or no typing at all. Sather, on the other hand, is "strongly typed". This means that every Sather object and every Sather variable has a specified type and that there are precise rules defining the types of object that each variable can hold. Sather is able to statically check programs for type correctness. This means that if a piece of Sather code is accepted by the interpreter or compiler, it is impossible for it to assign an object of an incorrect type to a variable.

Statically-checked strong typing helps the Sather compiler to generate more efficient code because it has more information. Sather avoids many of the runtime tag checking operations that are done by less strongly typed languages.

Statically-checked strongly typed languages help programmers to produce programs that are more likely to be correct. For example, a common mistake in C is to confuse the C assignment operator "=" with the C equality test "==". Because the C conditional statement "if(...)" doesn't distinguish between boolean types and other types, a C compiler is just as happy to accept "if(a=b)" as "if(a==b)". In Sather, the conditional statement will only accept boolean values and this kind of mistake is not possible.

Languages like Eiffel and Beta are strongly typed, but are not statically checkable. This means that some of the type checking must be done at runtime. This is certainly preferable to not doing it at all, but reduces the safety of programs. There may be a typing problem in obscure code that is not exercised by test routines. Because the error is not caught by the compiler, it may make it into a final release.

Sather distinguishes "abstract types" which represent more than one type of object from other types which do not. This has consequences for both the conceptual structure and the efficiency of programs. An example which has been widely discussed is the problem of the "add_vertex" routine for polygons. This is a routine which makes sense for generic polygons but does not make sense for triangles, squares, etc. In languages which do not separate abstract types from particular implementations, you are either forced to make all descendants implement routines which don't make sense for them or to leave out functionality in parent classes.

The Sather solution to this problem is based on abstract types. The Sather libraries include the abstract class "$POLYGON" which defines the abstract interface that all polygons must provide. It also includes the descendant class "POLYGON" which implements generic polygons. The "add_vertex" routine is defined in POLYGON but is not defined in $POLGYON. TRIANGLE and SQUARE, therefore, do not need to define it.

Runtime dispatching is only done for calls on variables declared by abstract types. The Sather compiler is itself a large program written in Sather which uses a lot of dispatching. The performance consequences of abstract types were studied by comparing a version of the compiler in which all calls were dispatched to the standard version (Lim and Stolcke, 1991). The use of explicit typing causes one-tenth the number of dispatches and an 11.3% reduction in execution time.

Separate Implementation and Type Inheritance

In most object-oriented languages inheritance both defines the subtype relation and causes the descendant to use an implementation provided by the ancestor. These are quite different notions and confounding them often causes semantic problems. For example, one reason why Eiffel's type system is not statically checkable is that it mandates "covariant" conformance for routine argument types (Meyer, 1992). This means that a routine in a descendant must have argument types which are subtypes of the corresponding argument types in the ancestor. Because of this choice, the compiler cannot ensure argument expressions conform to the argument type of the called routine at compile time. In Sather, inheritance from abstract classes defines subtyping while inheritance from other classes is used solely for implementation inheritance. This allows Sather to use the statically type-safe contravariant rule for routine argument conformance.

Multiple Inheritance

In Smalltalk and Objective-C, each class can only inherit from a single class. In Sather, classes can inherit from an arbitrary number of classes. This property is called "multiple inheritance". It is important because it commonly occurs in modelling physical types. For example, there might be types representing "means of transportation" and "major expenditures". The type representing "automobiles" should be a descendant of both of these. In languages like Smalltalk or Objective-C which only support "single inheritance", you would be forced to either make all "means of transportation" be "major expenditures" or vice versa.

Garbage Collection

The languages derived from C are usually not "garbage collected". This means that the programmer is responsible for explicitly creating and destroying objects. Unfortunately, these memory management issues often cut across natural abstraction boundaries. The objects in a class usually don't know when they are no longer referenced and the classes which use them objects shouldn't have to deal with low level memory allocation issues. Memory management done by the programmer is the source two of the most common kinds of bugs. If an object is freed while still being referenced, a later access may find the memory in an inconsistent state. These so-called "dangling pointers" are difficult to track down because they often cause errors in code which is far removed from the offending statement. "Memory leaks" are caused when an object is not freed even though there are no references to it. Programs with this bug use up more and more memory until they crash. This kind of error is also difficult to find. Sather uses a "garbage collector" which tracks down unused objects and reclaims the space they use automatically. To further enhance performance, the Sather libraries are designed to generate far less garbage than is typical in languages like Smalltalk or Lisp.

Interactive Interpreted Programming

Sather combines the flexibility of an interactive interpreted environment with very high efficiency compiled code. During development, the well-tested library classes are typically run compiled, while the new experimental code is run interpreted. The interpreter also allows immediate access to all the built-in algorithms and data structures for experimentation. Listing 3 is an example of an interactive Sather session.

>5+7 12 >40.intinf.factorial 815915283247897734345611269596115894272000000000 >#OUT + "Hello world!" Hello world! >v::=#VEC(1.0,2.0,3.0); w::=#VEC(1.0,2.0,3.0); >v+w #VEC(2.0, 4.0, 6.0) >v.dot(w) 14.0 >#ARRAY{STR}("grape", "cherry", "apple", "plum", "orange").sort #ARRAY{STR}("apple","cherry","grape","orange","plum") Listing 3

Iteration Abstraction

Most code is involved with some form of iteration. In the loop constructs of traditional languages, iteration variables must be explicitly initialized, incremented and tested. This code is notoriously tricky and is subject to "fencepost errors". Traditional iteration constructs require the internal implementation details of data structures like hash tables to be exposed when iterating over their elements.

Sather allows you to cleanly encapsulate iteration using constructs called "iters" (Murer, Omohundro, and Szyperski, 1993). They are like routines except that their names end in an exclamation point "!", their bodies may contain "yield" and "quit" statements, and they may only be called within loops. The Sather loop construct is simply: "loop ... end". When an iter yields, it returns control to the loop. When it is called in the next iteration of the loop, execution begins at the statement following the "yield". When an iter quits, it terminates the loop in which it appears.

All classes define the iters "until!(BOOL)", "while!(BOOL)", and "break!" to implement more traditional looping constructs. The integer class defines a variety of useful iters including "upto!(INT):INT", "downto!(INT):INT", and "step!(num,step:INT):INT". Listing 4 shows how "upto!" is used to output the digits from 1 to 9.

Container classes such as arrays or hash tables define an iter "elts!:T" to yield the contained elements and an iter "set_elts!(T)" to insert new elements. Listing 4 shows how to set the elements of an array to successive integers and then how to double them. Notice that this loop doesn't have to explicitly test indices against the size of the array.

The tree classes have iters to yield their elements according to the "pre", "post" and "in" orderings. The graph classes have iters to yield the vertices according to a depth-first search and breadth-first search orderings.

>loop #OUT+1.upto!(9) end 123456789 >a::=#ARRAY{INT}(asize:=10) >loop a.set_elts!(1.upto!(10)) end >a #ARRAY{INT}(1,2,3,4,5,6,7,8,9,10) >loop a.set_elts!(2*a.elts!) end >a #ARRAY{INT}(2,4,6,8,10,12,14,16,18,20) Listing 4

The Implementation

The first version of the Sather compiler was itself written in Sather by Chu-Cheow Lim and has been operational for several years. It compiles into C code and has been ported to a wide variety of machines. It is a fairly large program with about 30,000 lines of code in 183 classes (this compiles into about 70,000 lines of C code).

(Lim and Stolcke, 1991) extensively studied the performance of the compiler on both MIPS and Sparc architectures. Because the compiler uses C as an intermediate language, the quality of the executable code depends on the match of the C code templates used by the Sather compiler to the optimizations employed by the C compiler. Compiled Sather code runs within 10% of the performance of hand-written C code on the MIPS machine and is essentially as fast as hand-written C code on the Sparc architectures. On a series of benchmark tests (including examples like towers of Hanoi, 8 queens, etc.) Sather performed slightly better than C++ and several times better than Eiffel. The new compiler performs extensive automatic inlining and so provides more opportunities for optimization than typical hand written C code.

The Libraries

The Sather libraries currently contain several hundred classes and new ones are continually being written. Eventually, we hope to have efficient, well-written classes in every area of computer science. The libraries are covered by an unrestrictive license which encourages the sharing of software and crediting authors without prohibiting use in proprietary and commercial projects. Currently there are classes for basic data structures, numerical algorithms, geometric algorithms, graphics, grammar manipulation, image processing, statistics, user interfaces, and connectionist simulations.

pSather

Sather is also being extended to support parallel programming. An initial version of the language "pSather" (Murer, Feldman, and Lim, 1993) runs on the Sequent Symmetry and the Thinking Machines CM-5. pSather adds constructs for creating and manipulating threads, synchronization, locking variables, and working in parallel address spaces. The issues which make object-oriented programming important in a serial setting are even more important in parallel programming. Efficient parallel algorithms are often quite complex and should be encapsulated in well-written library classes. Different parallel architectures often require the use of different algorithms for optimal efficiency. The object-oriented approach allows the optimal version of an algorithm to be selected according to the machine it is actually running on. It is often the case that parallel code development is done on simulators running on serial machines. A powerful object-oriented approach is to write both simulator and machine versions of the fundamental classes in such a way that a user's code remains unchanged when moving between them.

Conclusions

We have described some of the fundamental design issues underlying Sather 1.0. The language is quite young but we are excited by its prospects. The user community is growing and new class development has become an international cooperative effort. We invite you join in its development!

Acknowledgements

Sather has adopted ideas from a number of other languages. Its primary debt is to Eiffel, designed by Bertrand Meyer, but it has also been influenced by C, C++, CLOS, CLU, Common Lisp, Dylan, ML, Modula-3, Oberon, Objective C, Pascal, SAIL, Self, and Smalltalk. Many people have contributed to the development and design of Sather. The contributions of Jeff Bilmes, Ari Huttunen, Jerry Feldman, Chu-Cheow Lim, Stephan Murer, Heinz Schmidt, David Stoutamire, and Clemens Szyperski were particularly relevant to the issues discussed in this article.

Bibliography

(ICSI Technical reports are available by anonymous ftp from "ftp.icsi.berkeley.edu")

Chu-Cheow Lim and Andreas Stolcke, "Sather Language Design and Performance Evaluation," Technical Report TR-91-034, International Computer Science Institute, Berkeley, Ca., May 1991.

Bertrand Meyer. Eiffel: The Language. Prentice Hall, New York, 1992.

Stephan Murer, Stephen Omohundro, and Clemens Szyperski, "Sather Iters: Object-oriented Iteration Abstraction" , submitted to ACM Letters on Programming Languages and Systems, 1993.

Stephan Murer, Jerome Feldman, and Chu-Cheow Lim, "pSather: Layered Extensions to an Object-Oriented Language for Efficient Parallel Computation", Technical Report TR-93-028, International Computer Science Institute, Berkeley, Ca., June 1993.

Stephen Omohundro, "Sather Provides Non-proprietary Access to Object-oriented Programming", Computers in Physics, 6(5):444-449, 1992.

Stephen Omohundro and Chu-Cheow Lim, "The Sather Language and Libraries," Technical report TR-92-017, International Computer Science Institute, Berkeley, Ca., 1991.

Heinz Schmidt and Stephen Omohundro, "Clos, Eiffel, and Sather: A Comparison," ICSI Technical Report TR-91-047, to appear in "The CLOS Book", edited by Andreas Paepcke, MIT Press.

Last change: 5/30/96
The Sather Team (sather@icsi.berkeley.edu)