Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Essential Tools Module User's Guide

6.19 Smalltalk-like Collection Classes

The Smalltalk-like collection classes are based on the language Smalltalk-80 and require that their collected objects singly or multiply inherit from the abstract base class RWCollectable.

These collection classes have a few disadvantages: they are slightly slower and not completely typesafe, and their objects are slightly larger. These disadvantages are easily outweighed by the power of these classes, and their clean programming interface. Most importantly, the Smalltalk-like collection classes are well-suited for heterogeneous collections and polymorphic persistence.

Many of the Essential Tools Module Smalltalk-like classes have a typedef to either the corresponding Smalltalk names, or to a generic name. This typedef is activated by defining the preprocessor macro RW_STD_TYPEDEFS. Although you are free to use typedefs, we do encourage you to use the actual class names to make your code more maintainable.

6.19.1 Tables of the Smalltalk-like Classes

The following table and figure summarize the Essential Tools Module Smalltalk-like classes. Table 15 lists all 17 classes, along with their typedefs, iterators, and implementations. Figure 2 illustrates the class hierarchy.

Table 15: Smalltalk-like collection classes and their iterators, typedefs, and implementations 

Class Iterator Smalltalk typedef (deprecated) Implemented as

RWBag

RWBagIterator

Bag

Dictionary of occurrences

RWBinaryTree

RWBinaryTreeIterator

SortedCollection

Binary tree

RWBTree

   

B-tree in memory

RWBTreeDictionary

   

B-tree of associations

RWCollection

RWIterator

Collection

Abstract base class

RWDlistCollectables

RWDlistCollectablesIterator

 

Doubly-linked list

RWHashTable

RWHashTableIterator

 

Hash table

RWHashDictionary

RWHashDictionaryIterator

Dictionary

Hash table of associations

RWIdentityDictionary

RWHashDictionaryIterator

IdentityDictionary

Hash table of associations

RWIdentitySet

RWSetIterator

IdentitySet

Hash table

RWOrdered

RWOrderedIterator

OrderedCollection

Vector of pointers

RWSequenceable

RWIterator

Sequenceable

Abstract base class

RWSet

RWSetIterator

Set

Hash table

RWSlistCollectables

RWSlistCollectablesIterator

LinkedList

Singly-linked list

RWSlistCollectablesQueue

(n/a)

Queue

Singly-linked list

RWSlistCollectablesStack

(n/a)

Stack

Singly-linked list

RWSortedVector

RWSortedVectorIterator

 

Vector of pointers, using insertion sort

Figure 2: The class hierarchy of the Smalltalk-like collection classes


NOTE -- Some of these classes use multiple-inheritance: this hierarchy is shown relative to the RWCollectable base class.

6.19.2 Example

To orient ourselves, we always like to start with an example. The following example uses a SortedCollection to store and order a set of RWCollectableStrings. SortedCollection is actually a typedef for the Smalltalk-like collection class RWBinaryTree. Objects inserted into it are stored in order according to their relative values as returned by the virtual function compareTo(). (See Section 6.17.4.)

Here is the code:

Program Output:

The following explains the code line-by-line:

//1 By defining the preprocessor macro RW_STD_TYPEDEFS, we enable the set of Smalltalk-like typedefs. We can then use the typedef SortedCollection instead of RWBinaryTree, its true identity.
//2 The second #include declares class RWCollectableString, a derived class that multiply inherits from its base classes RWCString and RWCollectable. RWCollectableString inherits functionality from RWCString, and "ability to be collected" from class RWCollectable.
//3 An empty SortedCollection was created at this line.
//4-//7  Four RWCollectableStrings were created off the heap and inserted into the collection, in no particular order. See the Essential Tools Module Reference Guide for details on constructors for RWCollectableStrings. The objects allocated here normally should be deleted before the end of the program, but we omitted this step to make the example more concise.
//8  A pointer to an RWCollectableString was declared and defined here.
//9  An iterator was constructed from the SortedCollection sc.
//10  The iterator is then used to step through the entire collection, retrieving each value in order. The function call operator operator() has been overloaded so that the iterator means "step to the next item and return a pointer to it." All Essential Tools Module iterators work this way. See Stroustrup (1986, Section 7.3.2) for an example and discussion of iterators, as well as Section 6.6 of this manual. The typecast: str = (RWCollectableString*)sci() is necessary because the iterator returns an RWCollectable*; that is, a pointer to an RWCollectable which must then be cast into its actual identity.
//11  Finally, the pointer str is dereferenced and printed. The ability of an RWCollectableString to be printed is inherited from its base class RWCString.

When run, the program prints out the four collected strings in order; for class RWCollectableString, this means lexicographical order.

6.19.3 Choosing a Smalltalk-like Collection Class

Now that you have reviewed the list of Smalltalk-like collection classes, their class hierarchy, and an example of how they work, you may be wondering how you can use them. This section gives an overview of the various Smalltalk-like collection classes to help you choose an appropriate one for your problem. You can also see Appendix A, on choosing.

6.19.3.1 Bags Versus Sets Versus Hash Tables

Class RWHashTable is the easiest Smalltalk-like collection class to understand. It uses a simple hashed lookup to find the bucket where a particular object occurs, then does a linear search of the bucket to find the object. A key concept is that multiple objects that test isEqual to each other can be inserted into a hash table.

Class RWBag is similar to RWHashTable, except that it counts occurrences of multiple objects with the same value; that is, it retains only the first occurrence and merely increments an occurrence count for subsequent ones. RWBag is implemented as a dictionary, where the key is the inserted object and the value is the occurrence count. This is the same way the Smalltalk Bag object is implemented. Note that this implementation differs significantly from many other C++ Bag classes which are closer to the RWHashTable class and not true Bags.

Class RWSet is similar to its base class RWHashTable, except that it doesn't allow duplicates. If you try to insert an object that isEqual to an object already in RWSet, the object will be rejected.

Class RWIdentitySet, which inherits from RWSet, retrieves objects on the basis of identity instead of value. Because RWIdentitySet is a Set, it can take only one instance of a given object.

Note that the ordering of objects in any of these classes based on hash tables is not meaningful. If ordering is important, you should choose a sequenceable class.

6.19.3.2 Sequenceable Classes

Classes inheriting from RWSequenceable have an innate ordering. You can speak meaningfully of their first or last object, or of their 6th or ith object.

These classes are implemented generally as either a vector, or a singly-linked or doubly-linked list. Vector-based classes make good stacks and queues, but poor insertions in the middle. If you exceed the capacity of a vector-based collection class, it will automatically resize, but it may exact a significant performance penalty to do so.

Note that the binary and B-tree classes can be considered sequenceable in the sense that they are sorted, and therefore have an innate ordering. However, their ordering is determined internally by the relative value of the collected objects, rather than by an insertion order. In other words, you cannot arbitrarily insert an object into a sorted collection in any position you want because it might not remain sorted. Hence, these classes are subclassed separately.

6.19.3.3 Dictionaries

Sometimes referred to as maps, dictionaries use an external key to find a value. The key and value may be of different types, and in fact usually are. You can think of dictionaries as associating a given key with a given value. For example, if you were building a symbol table in a compiler, you might use the symbol name as the key, and its relocation address as the value. This approach would contrast with your approach for a Set, where the name and address would have to be encapsulated into one object.

The Essential Tools Module provides two dictionary classes: RWHashDictionary, implemented as a hash table, and RWBTreeDictionary, implemented as a B-tree. For these classes, both keys and values must inherit from the abstract base class RWCollectable.

6.19.4 Virtual Functions Inherited From RWCollection

The Smalltalk-like collection classes inherit from the abstract base class RWCollection, which in turn inherits from the abstract base class RWCollectable, described in Section 6.16, Section 6.17, Section 6.18, and Section 6.19.1. (Thus do we produce collections of collections, but that is another story.)

An abstract base class is a class intended to be inherited by some other class, not used as itself per se. If you think of it as a kind of virtual class, you can easily project the meaning of virtual functions. These virtual functions provide a blueprint of functionality for the derived class. As an abstract base class, RWCollection provides a blueprint for collection classes by declaring various virtual functions, such as insert(), remove(), entries(), and so on.

This section describes the virtual functions inherited by the Smalltalk-like collections. Any of these collections can be expected to understand them.

6.19.4.1 insert()

You can put a pointer to an object into a collection by using the virtual function insert():

This function inserts in the way most natural for the collection. Faced with a stack, it pushes an item onto the stack. Faced with a queue, it appends the item to the queue. In a sorted collection, it inserts the new item so that items before it compare less than itself, items after it compare greater than itself, and items equal compare equal, if duplicates are allowed. See the example in Section 6.19.2 for an example using insert().

You must always insert pointers to real objects. Since all RWCollection classes need to dereference their contents for some methods such as find(), inserting a zero will cause such methods to crash. If you must store an empty object, we suggest you create and insert a default constructed object of the appropriate type, such as RWCollectable*.

6.19.4.2 find() and Friends

You can use the following virtual functions to test how many objects a collection contains, and whether it contains a particular object:

The function isEmpty() returns TRUE if the collection contains no objects. The function entries() returns the total number of objects that the collection contains.

The function contains() returns TRUE if the argument is equal to an item within the collection. The meaning of is equal to depends on the collection and the type of object being tested. Hashing collections use the virtual function isEqual() to test for equality, after first hashing the argument to reduce the number of possible candidates to those in one hash bucket. (Here it is important that all items which are isEqual with each other hash to the same value!). Sorted collections search for an item that compares equal to the argument; in other words, an item for which compareTo() returns zero.

The virtual function occurrencesOf() is similar to contains(), but returns the number of items that are equal to the argument.

The virtual function find() returns a pointer to an item that is equal to its argument.

The following example, which builds on the example in Section 6.19.2, uses find() to find occurrences of Mary in the collection, and occurrencesOf() to find the number of times Mary occurs:

Program Output:

Here is the line-by-line description:

//1-//7  These lines are from Section 6.19.2.
//8  Insert another instance with the value Mary.
//9  This statement prints out 5, the total number of entries in the sorted collection.
//10  A throwaway variable dummy is constructed, to be used to test for the occurrences of strings containing Mary.
//11  The collection is asked to return a pointer to the first object encountered that compares equal to the argument. A nil pointer (zero) is returned if there is no such object.
//12  The pointer is tested to make sure it is not nil.
//13  Paranoid check. In this example, it is obvious that the items in the collection must be of type RWCollectableString. In general, it may not be obvious.
//14  Because of the results of step 13, the cast to an RWCollectableString pointer is safe. The pointer is then dereferenced and printed.
//15  If the pointer t was nil, then an error message would have been printed here.
//16  The call to occurrencesOf() returns the number of items that compare equal to its argument. In this case, two items are found, the two occurrences of Mary.

6.19.4.3 remove() Functions

To search for and remove particular items, you can use the functions remove() and removeAndDestroy():

The function remove() looks for an item that is equal to its argument and removes it from the collection, returning a pointer to it. It returns nil if no item is found.

The function removeAndDestroy() is similar except it deletes the item instead of returning it, using the virtual destructor inherited by all RWCollectable items. You must be careful when using this function that the item was actually allocated off the heap, not the stack, and that it is not shared with another collection.

The following example, which expands on the previous one, demonstrates the use of the virtual function removeAndDestroy():

//17  Removes the first occurrence of the string containing Mary and returns a pointer to it. This pointer will be nil if there is no such item.
//18  Deletes the item, which was originally allocated off the heap. There is no need to check the pointer against nil because the language guarantees that it is always OK to delete a nil pointer.
//19  In this statement, the remaining occurrence of Mary is both removed and deleted.

6.19.4.4 apply() Functions

To efficiently examine the members of a Smalltalk-like collection, use the member function apply():

The first argument, RWapplyCollectable, is a typedef:

In other words, RWapplyCollectable is a pointer to a function with prototype:

where yourApplyFunction is the name of the function. You must supply this function. It will be called for each item in the collection, in whatever order is appropriate for the collection, and passed as a pointer to the item as its first argument. The second argument x is passed through from the call to apply(), and is available for your use. For example, you could use it to hold a handle to a window on which the object is to be drawn.

Note that the apply() functions of the Smalltalk-like collections and the generic collections are similar. (Compare Section 7.5.2.) The difference is in the type of the first argument of the user-supplied function: the Smalltalk-like collections use RWCollectable*, while the generic collections use type*. With both sets of collections, you must be careful that you cast the pointer item to the proper derived class.

The apply-functions generally employ the most efficient method for examining all members of the collection. This is their great advantage. Their disadvantage is that they are slightly clumsy to use, requiring you to supply a separate function. The functional equivalent to apply() in the Smalltalk world is do. It takes just one argument: a piece of code to be evaluated for each item in the collection.

6.19.4.5 Functions clear() and clearAndDestroy()

To remove all items from the collection, you can use the functions clear() and clearAndDestroy():

The function clearAndDestroy() not only removes the items, but also calls the virtual destructor for each item. You must use this function with care. The function does check to see if the same item occurs more than once in a collection (by building an RWIdentitySet internally), and thereby deletes each item only once. However, it cannot check whether an item is shared between two different collections. You must also be certain that every member of the collection was allocated off the heap.

6.19.5 Other Functions Shared by All RWCollections

There are several other functions that are shared by all classes that inherit from RWCollection. Note that these are not virtual functions.

6.19.5.1 Class Conversions

The following functions allow any collection class to be converted into an RWBag, RWSet, RWOrdered, or a SortedCollection (that is, an RWBinaryTree):

Note that since these functions mimic similar Smalltalk methods, they return a copy of the new collection by value. For large collections, this can be very expensive. Consider using operator+=() instead.

6.19.5.2 Inserting and Removing Other Collections

You can use these functions to respectively insert or remove the contents of their argument.

6.19.5.3 Selection

The function select():

evaluates the function pointed to by tst for each item in the collection. It inserts those items for which the function returns TRUE into a new collection of the same type as self and returns a pointer to it. This new collection is allocated off the heap, so you are responsible for deleting it when done.

6.19.6 Virtual Functions Inherited from RWSequenceable

Collections that inherit from the abstract base class RWSequenceable, which inherits from RWCollectable, have an innate, meaningful ordering. This section describes the virtual functions inherited from RWSequenceable which make use of that ordering. For example, the following virtual functions allow access to the ith item in the collection:

Remember that the first item in any collection is at position i=0. The compiler chooses which function to use on the basis of whether or not your collection has been declared const: the second variant of the function is for const collections, the first for all others. The first variant can also be used as an lvalue, as in the following example:

As you might expect, the operations above are efficient for the class RWOrdered, which is implemented as a vector, but relatively inefficient for a class implemented as a linked-list, because the entire list must be traversed to find a particular index.

The following virtual functions return the first or last item in the collection, respectively, or nil if the collection is empty:

The next virtual function returns the index of the first object that is equal to the argument, or the special value RW_NPOS if there is no such object:

Here is an example of the index function in use. The output shows that the index of the variable we were searching for was found at position 1.

Program Output:

Finally, you can use the following function to insert an item at a particular index:

In the example below, the code uses the function insertAt to insert 4 at position 1.

6.19.7 A Note on How Objects are Found

You may save yourself some difficulty by remembering the following point: the virtual functions of the object within the collection, not those of the target, are called when comparing or testing a target for equality.

The following code fragment illustrates the point:

In this example, the virtual functions of member within the collection RWCollectableString are called, not the virtual functions of target. In other words:

6.19.7.1 Hashing

Hashing is an efficient method for finding an object within a collection. All the collection classes that use hashing follow the same general strategy. First, member function hash() of the target is called to find the proper bucket within the hash table. A bucket is implemented as a singly-linked list. Because all the members of a bucket have the same hash value, the bucket is linearly searched to find the exact match. This is done by calling member function isEqual() of the candidate (see above) with each member of the bucket as the argument. The first argument that returns TRUE is the chosen object. Be careful not to design your class so that two objects that test true for isEqual() can have different hash values, since this algorithm will fail for such objects.

In general, because of this combination of hashing and linear searching, as well as the complexity of most hashing algorithms, the ordering of the objects within a hash collection will not make a lot of sense. Hence, when the apply() function or an iterator scans through the hashing table, the objects will be visited in what appears to be random order.



Previous fileTop of DocumentContentsIndex pageNext file

©2004 Copyright Quovadx, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Quovadx, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.