Collections are containers for other objects. Smalltalk provides a rich set of collection classes ready to be used.
The protocol offered by these collections is very orthogonal, making
good use of polymorphism. For example, most collections provide methods
like at:
, includes:
or do:
,
even though the implementations are totally different and the performance
may vary greatly between different collections and access methods
(*).
To get a rough idea of the common protocol, have a look at the class
Collection
. This is the superclass of all collections,
where most of the common operations are implemented.
(Although, many subclasses redefine these methods.)
The most useful collection classes are:
less frequently used collections:
seldom used:
anArray := Array new:size
anArray := someCollection asArray
theSize := anArray size
anArray at:index put:anObject
element := anArray at:index
anArray includes:anObject
anArray indexOf:anObject
anArray do:[:element | ... do something with element ...]
#( element1 element2 ... elementN )
where the individual elements must be literals (numbers, characters,
strings or other arrays).
Nested literal Arrays are written as:
#(
(element1.1 element1.2 ... element1.M)
(element2.1 element2.2 ... element2.M)
...
(elementN.1 elementN.2 ... elementN.M)
)
Notice, the above is an Array of 3 Arrays. Not to confuse
with an array containing 9 elements (which you get, if the inner parenthesises
are ommited).
Smalltalk/X also allows the objects true
,
false
and nil
to be treated like constants in
literal arrays. This is NOT compatible with some older Smalltalk implementations,
so care should be taken if that code is to be ported to other systems later.
For example,
represents a 5-element array literal consisting of
#( nil 1 'foo' true #true )
nil
,
the integer 1
, the string 'foo'
, the boolean
true
and the symbol #true
.
theSize := aString size
aString at:index put:someCharacter
element := aString at:index
Transcript show:aString
aString print
newString := string1 , string2
newString := aString1 copyFrom:index1 to:index2
Constants strings (string literals) are entered by enclosing the characters in single quotes
('
). Single quotes within string constants are to be doubled.
String literals can cross line boundaries; in this case, the newline character
is part of the string.
Examples:
'a normal string constant'
'a string constant with embedded ''quotes'''
'a string constant
with mutliple
lines'
Therefore, symbols can be compared using identity compare (==), which is much faster than value compare (=). Identity compare is technically implemented as a pointer compare, while value compare on strings is implementated by comparing each element of the two objects.
Symbols are internally used by the system to assign names to class objects (in the Smalltalk global table), and to assign method names to method objects (in each classes method/selector association). Also, symbols are perfect to use as keys in IdentityDictionaries.
Since symbols are handled just as fast as integers, they are also perfect for flags, states (state machines) or other variables which are represented by enumeration types or define-constants in other programming languages.
Symbols do not allow change of their character elements; the at:put:
method will raise an error. Other than that, all read accesses are the same as for
strings (actually, class Symbol inherits all those methods from the String class).
Symbols can be created (at runtime) from a string with:
also, symbol constants (i.e. symbol literals)
can be written directly by prefixing some identifier with the #-character as in:
sym := aString asSymbol
These symbol literals are created at compile time -
at execution time,so the cost of using them is not more than using integers.
sym := #helloThere.
Only alphanumeric identifiers or valid message selectors
are allowed in the above example.
Examples for valid symbol iterals are:
Use quotes around the characters for other symbol literals:
#helloThere
#FooBar
#+
#at:
#at:put:
The quotes don't hurt; therefore
sym := #'a symbol with spaces and .... other stuff'
#+
and #'+'
represent
the same symbol.
Symbol literals can also be present in array literals
as in:
Within such an array literal, the leading #-character can be ommitted
(for your convenience).
Thus, the above is equivalent to:
#( #foo #bar #baz )
but, NOT equivalent to:
#( foo bar baz )
which is an array of strings.
#( 'foo' 'bar' 'baz' )
Since true
, false
and nil
are treated
as constant objects, you have to add a #-prefix as in:
if you want those symbols in an array literal (which is unlikely).
#( #true #false)
=
, which typically compares objects values.
1
(the Integer "one") will be considered
equal to 1.0
(the Float "one").
IdentitySet
uses an identity compare
(==
) and will treat 1
and 1.0
as different
objects.
aSet := Set new
aSet := Set new:1000
theSize := aSet size
aSet add:anObject
aSet remove:anObject
aSet includes:anObject
aSet do:[:element | ... do something with element ...]
at:
and at:put:
) are
not allowed and will trigger an error.
Sets and identitySets use hashing algorithms internally for inclusion tests. Thus, they are usually fast and show a constant access time which is determined by their fill-grade and the quality of the hash key algorithm; not by the absolute size of the set. The hash key generation is implemented by the elements - not by the set.
Sets automatically grow (and shrink) as required when elements are added or removed. To get good hashing performance, they grow larger (by some percentage) than the actual size required for the elements. Expect some 20% storage overhead when using sets.
IdentitySets are usually much faster than Sets, since both hashKey generation
and comparing is implemented more efficient in the first.
If your keys are small integers or symbols, you should always use IdentitySets.
However, for strings, identitySet is usually a bad choice.
at:
and at:put:
messages using an access key.
However, in contrast to arrays they allow any other object to be used as key.
Thus an array can be thought of as a special dictionary using numeric keys
only (if you want to do this, its faster to use an array rightaway - except
if your array is filled very sparse).
Dictionary
uses
value compare (=
) and IdentityDictionary
which uses
identity compare (==
) for key accesses.
|d value|
d := Dictionary new.
d at:1 put:'one'.
...
value := d at:1.0
will leave the expected 'one'
in the variable value.
|d value|
d := IdentityDictionary new.
d at:1 put:'one'.
...
value := d at:1.0
will raise an error, since there is no element stored under that key
(remember: 1 = 1.0
returns true, while 1 == 1.0
returns false).
Like with sets, dictionaries use hashing internally; IdentityDictionaries are usually faster than Dictionaries.
Dictionaries are very powerful collections, and worth a second look.
You can put associations between any objects into dictionaries.
A simple example:
You can also use Dictionaries as a replacement for simple data structures,
which are called record or struct in other languages:
|nameToAge|
nameToAge := Dictionary new.
nameToAge at:'fred' put:17.
nameToAge at:'peter' put:21.
...
age := nameToAgeAt:name ifAbsent:['dont know'].
Of course, real programmers define a class Person and create an
instance of it ... this is both more 'object oriented' and provides
faster access than using dictionaries (which are not meant to be used
for the above 'constant' key access).
|peter marry|
peter := IdentityDictionary new.
peter at:#age put:23.
peter at:#firstName put:'peter'.
peter at:#middleName put:'m'.
peter at:#name put:'smalltalker'.
peter at:#salary put:51234.
...
marry := IdentityDictionary new.
marry at:#age put:21.
marry at:#firstName put:'marry'.
marry at:#middleName put:'s'.
marry at:#name put:'smalltalker'.
marry at:#husband put:peter.
...
age := peter at:#age.
firstNameOfHusband := (marry at:#husband) at:#firstName.
typical use:
coll := OrderedCollection new
coll := OrderedCollection new:1000
theSize := coll size
coll add:anObject
coll addFirst:anObject
coll removeFirst
coll removeLast
coll includes:anObject
coll do:[:element | ... do something with element ...]
includes:
when applied to large
collections is usually much slower for OrderedCollection or Array than it
is for Sets or Dictionaries. The reason is that the former have to do
a sequential search over all elements, while the later can use a fast
hashed access.
Sorting is done using quickSort and binary search (when inserting elements). typical use:
sorted := someOtherCollection asSortedCollection
sorted := SortedCollection new
sorted sortBlock:[:a :b | a > b]
sorted sortBlock:[:a :b | a < b]
sorted sortBlock:[:a :b | a asUppercase < b asUppercase]
sorted add:someObject
sorted includes:someObject
Bytearrays are also valuable for data interchange with other programs via
files, sockets, shared memory or pipes.
Protocolwise, they behave much like Arrays.
(2 to:10 by:2)
.
Link
or
ValueLink instances. They provide reasonable fast
insertion methods, but show poor performance for deletion and insertion
tests.
SharedQueue
(which is a subclass of Queue
) and helps in writing
producer/consumer applications with separate writer/reader processes.
Notes:
(*)
Some collections are tuned for growing, while others are not;
some use hashing or other fast access methods to support fast
search and/or inclusion test, while others do sequential searches.
Check out and understand the individual collection classes and use the one
which best fits your needs to get best performance.
Copyright © Claus Gittinger Development & Consulting, all rights reserved
(cg@ssw.de)