Chapter 12: Abstract Containers

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.

Please state the document version you're referring to, as found in the title (in this document: 9.0.2) and please state chapter and paragraph name or number you're referring to.

All received mail is processed conscientiously, and received suggestions for improvements are usually processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

C++ offers several predefined datatypes, all part of the Standard Template Library, which can be used to implement solutions to frequently occurring problems. The datatypes discussed in this chapter are all containers: you can put stuff inside them, and you can retrieve the stored information from them.

The interesting part is that the kind of data that can be stored inside these containers has been left unspecified at the time the containers were constructed. That's why they are spoken of as abstract containers.

Abstract containers rely heavily on templates, covered in chapter 20 and beyond. To use abstract containers, only a minimal grasp of the template concept is required. In C++ a template is in fact a recipe for constructing a function or a complete class. The recipe tries to abstract the functionality of the class or function as much as possible from the data on which the class or function operates. As the data types on which the templates operate were not known when the template was implemented, the datatypes are either inferred from the context in which a function template is used, or they are mentioned explicitly when a class template is used (the term that's used here is instantiated). In situations where the types are explicitly mentioned, the angle bracket notation is used to indicate which data types are required. For example, below (in section 12.2) we'll encounter the pair container, which requires the explicit mentioning of two data types. Here is a pair object containing both an int and a string:

    pair<int, string> myPair;
The object myPair is defined as an object holding both an int and a string.

The angle bracket notation is used intensively in the upcoming discussion of abstract containers. Actually, understanding this part of templates is the only real requirement for using abstract containers. Now that we've introduced this notation, we can postpone the more thorough discussion of templates to chapter 20, and concentrate on their use in this chapter.

Most of the abstract containers are sequential containers: they contain data that can be stored and retrieved in some sequential way. Examples are the vector, implementing an extendable array; the list, implementing a datastructure that allows for the easy insertion or deletion of data the queue, also called a FIFO (first in, first out) structure, in which the first element that is entered is the first element to be retrieved again; and the stack, which is a first in, last out (FILO or LIFO) structure.

In addition to sequential containers several special containers are available. The pair is a basic container in which a pair of values (of types that are left open for further specification) can be stored, like two strings, two ints, a string and a double, etc.. Pairs are often used to return data elements that naturally come in pairs. For example, the map is an abstract container storing keys and their associated values. Elements of these maps are returned as pairs.

A variant of the pair is the complex container, implementing operations that are defined on complex numbers.

All abstract containers described in this chapter as well as the string and stream datatypes (cf. chapters 5 and 6) are part of the Standard Template Library.

All but the unordered containers containers, support the following basic set of basic operators:

Note that before a user-defined type (usually a class-type) can be stored in a container, the user-defined type should at least support: With the advent of the C++0x standard sequential containers can also be initialized using initializer lists.

Most containers (exceptions are the stack (section 12.3.10), priority_queue (section 12.3.4), and queue (section 12.3.3) containers) support members to determine their maximum sizes (through their member function max_size).

Closely linked to the standard template library are the generic algorithms. These algorithms may be used to perform frequently occurring tasks or more complex tasks than is possible with the containers themselves, like counting, filling, merging, filtering etc.. An overview of generic algorithms and their applications is given in chapter 19. Generic algorithms usually rely on the availability of iterators, representing begin and end-points for processing data stored inside containers. The abstract containers usually support constructors and members expecting iterators, and they often have members returning iterators (comparable to the string::begin and string::end members). In this chapter the iterator concept is not further investigated. Refer to chapter 18 for this.

The url http://www.sgi.com/Technology/STL is worth visiting as it offers more exensive coverage of abstract containers and the standard template library than can be provided by the C++ annotations.

Containers often collect data during their lifetimes. When a container goes out of scope, its destructor tries to destroy its data elements. This only succeeds if the data elements themselves are stored inside the container. If the data elements of containers are pointers to dynamically allocated memory then the memory pointed to by these pointers is not destroyed, resulting in a memory leak. A consequence of this scheme is that the data stored in a container should often be considered the `property' of the container: the container should be able to destroy its data elements when the container's destructor is called. So, normally containers should not contain pointers to data. Also, a container should not be required to contain const data, as const data prevent the use of many of the container's members, like the assignment operator.

12.1: Notations used in this chapter

In this chapter about containers, the following notational conventions are used: Some containers, e.g., the map container, contain pairs of values, usually called `keys' and `values'. For such containers the following notational convention is used in addition:

12.2: The `pair' container

The pair container is a rather basic container. It is used to store two elements, called first and second, and that's about it. Before using pair containers the header file <utility> must have been included.

The pair's data types are specified when the pair object is defined (or declared) using the template's angle bracket notation (cf. chapter 20). Examples:

    pair<string, string> piper("PA28", "PH-ANI");
    pair<string, string> cessna("C172", "PH-ANG");
here, the variables piper and cessna are defined as pair variables containing two strings. Both strings can be retrieved using the first and second fields of the pair type:
    cout << piper.first << '\n' <<      // shows 'PA28'
            cessna.second << '\n';      // shows 'PH-ANG'
The first and second members can also be used to reassign values:
    cessna.first = "C152";
    cessna.second = "PH-ANW";
If a pair object must be completely reassigned, an anonymous pair object can be used as the right-hand operand of the assignment. An anonymous variable defines a temporary variable (which receives no name) solely for the purpose of (re)assigning another variable of the same type. Its generic form is
    type(initializer list)
Note that when a pair object is used the type specification is not completed by just mentioning the containername pair. It also requires the specification of the data types which are stored within the pair. For this the (template) angle bracket notation is used again. E.g., the reassignment of the cessna pair variable could have been accomplished as follows:
    cessna = pair<string, string>("C152", "PH-ANW");
In cases like these, the type specification can become quite elaborate, which has caused a revival of interest in the possibilities offered by the typedef keyword. If many pair<type1, type2> clauses are used in a source, the typing effort may be reduced and readability might be improved by first defining a name for the clause, and then using the defined name later. E.g.,
    typedef pair<string, string> pairStrStr;

    cessna = pairStrStr("C152", "PH-ANW");
Apart from this (and the basic set of operations (assignment and comparisons)) the pair offers no further functionality. It is, however, a basic ingredient of the upcoming abstract containers map, multimap and hash_map.

The C++0x standard offers a generalized pair container: the tuple, covered in section 21.5.5.

12.3: Sequential Containers

12.3.1: The `vector' container

The vector class implements an expandable array. Before using the vector container the <vector> header file must have been included.

The following constructors, operators, and member functions are available:

12.3.2: The `list' container

The list container implements a list data structure. Before using a list container the header file <list> must have been included.

The organization of a list is shown in figure 8.

Figure 8 is shown here.
Figure 8: A list data-structure


Figure 8 shows that a list consists of separate list-elements, connected by pointers. The list can be traversed in two directions: starting at Front the list may be traversed from left to right, until the 0-pointer is reached at the end of the rightmost list-element. The list can also be traversed from right to left: starting at Back, the list is traversed from right to left, until eventually the 0-pointer emanating from the leftmost list-element is reached.

As a subtlety note that the representation given in figure 8 is not necessarily used in actual implementations of the list. For example, consider the following little program:

    int main()
    {
        list<int> l;
        cout << "size: " << l.size() << ", first element: " <<
                l.front() << '\n';
    }
When this program is run it might actually produce the output:
    size: 0, first element: 0
Its front element can even be assigned a value. In this case the implementor has chosen to provide the list with a hidden element. The list actually is a circular list, where the hidden element serves as terminating element, replacing the 0-pointers in figure 8. As noted, this is a subtlety, which doesn't affect the conceptual notion of a list as a data structure ending in 0-pointers. Note also that it is well known that various implementations of list-structures are possible (cf. Aho, A.V., Hopcroft J.E. and Ullman, J.D., (1983) Data Structures and Algorithms (Addison-Wesley)).

Both lists and vectors are often appropriate data structures in situations where an unknown number of data elements must be stored. However, there are some rules of thumb to follow when selecting the appropriate data structure.

At present lists aren't as useful anymore as they used to be (when computers were much slower and more memory-constrained). Except maybe for some rare cases, a vector should be the preferred container; even when implementing algorithms traditionally using lists.

Other considerations related to the choice between lists and vectors should also be given some thought. Although it is true that the vector is able to grow dynamically, the dynamic growth requires data-copying. Clearly, copying a million large data structures takes a considerable amount of time, even on fast computers. On the other hand, inserting a large number of elements in a list doesn't require us to copy non-involved data. Inserting a new element in a list merely requires us to juggle some pointers. In figure 9 this is shown: a new element is inserted between the second and third element, creating a new list of four elements.

Figure 9 is shown here.
Figure 9: Adding a new element to a list


Removing an element from a list is also fairly easy. Starting again from the situation shown in figure 8, figure 10 shows what happens if element two is removed from our list. Again: only pointers need to be juggled. In this case it's even simpler than adding an element: only two pointers need to be rerouted.

Figure 10 is shown here.
Figure 10: Removing an element from a list


To summarize the comparison between lists and vectors: it's probably best to conclude that there is no clear-cut answer to the question what data structure to prefer. There are rules of thumb, which may be adhered to. But if worse comes to worst, a profiler may be required to find out what's best.

The list container offers the following constructors, operators, and member functions:

12.3.3: The `queue' container

The queue class implements a queue data structure. Before using a queue container the header file <queue> must have been included.

A queue is depicted in figure 11.

Figure 11 is shown here.
Figure 11: A queue data-structure


In figure 11 it is shown that a queue has one point (the back) where items can be added to the queue, and one point (the front) where items can be removed (read) from the queue. A queue is therefore also called a FIFO data structure, for first in, first out. It is most often used in situations where events should be handled in the same order as they are generated.

The following constructors, operators, and member functions are available for the queue container:

Note that the queue does not support iterators or a subscript operator. The only elements that can be accessed are its front and back element. A queue can be emptied by:

12.3.4: The `priority_queue' container

The priority_queue class implements a priority queue data structure. Before using a priority_queue container the <queue> header file must have been included.

A priority queue is identical to a queue, but allows the entry of data elements according to priority rules. A real-life priority queue is found, e.g., at airport check-in terminals. At a terminal the passengers normally stand in line to wait for their turn to check in, but late passengers are usually allowed to jump the queue: they receive a higher priority than other passengers.

The priority queue uses operator< of the data type stored in the priority queue to decide about the priority of the data elements. The smaller the value, the lower the priority. So, the priority queue could be used to sort values while they arrive. A simple example of such a priority queue application is the following program: it reads words from cin and writes a sorted list of words to cout:

#include <iostream>
#include <string>
#include <queue>
using namespace std;

int main()
{
    priority_queue<string> q;
    string word;

    while (cin >> word)
        q.push(word);

    while (q.size())
    {
        cout << q.top() << '\n';
        q.pop();
    }
}

Unfortunately, the words are listed in reversed order: because of the underlying <-operator the words appearing later in the ASCII-sequence appear first in the priority queue. A solution to that problem is to define a wrapper class around the string datatype, reversing string's operator<. Here is the modified program:

#include <iostream>
#include <string>
#include <queue>

class Text
{
    std::string d_s;

    public:
        Text(std::string const &str)
        :
            d_s(str)
        {}
        operator std::string const &() const
        {
            return d_s;
        }
        bool operator<(Text const &right) const
        {
            return d_s > right.d_s;
        }
};

using namespace std;

int main()
{
    priority_queue<Text> q;
    string word;

    while (cin >> word)
        q.push(word);

    while (q.size())
    {
        word = q.top();
        cout << word << '\n';
        q.pop();
    }
}

Other possibilities to achieve the same exist. One would be to store the contents of the priority queue in, e.g., a vector, from which the elements can be read in reversed order.

The following constructors, operators, and member functions are available for the priority_queue container:

Note that the priority queue does not support iterators or a subscript operator. The only element that can be accessed is its top element. A priority queue can be emptied by:

12.3.5: The `deque' container

The deque (pronounce: `deck') class implements a doubly ended queue data structure (deque). Before using a deque container the header file <deque> must have been included.

A deque is comparable to a queue, but it allows for reading and writing at both ends. Actually, the deque data type supports a lot more functionality than the queue, as illustrated by the following overview of available member functions. A deque is a combination of a vector and two queues, operating at both ends of the vector. In situations where random insertions and the addition and/or removal of elements at one or both sides of the vector occurs frequently using a deque should be considered.

The following constructors, operators, and member functions are available for deques:

12.3.6: The `map' container

The map class offers a (sorted) associative array. Before using a map container the <map> header file must have been included.

A map is filled with key/value pairs, which may be of any container-accepted type. Since types are associated with both the key and the value, we must specify two types in the angle bracket notation, comparable to the specification we've seen with the pair container (cf. section 12.2). The first type represents the key's type, the second type represents the value's type. For example, a map in which the key is a string and the value is a double can be defined as follows:

    map<string, double> object;
The key is used to access its associated information. That information is called the value. For example, a phone book uses the names of people as the key, and uses the telephone number and maybe other information (e.g., the zip-code, the address, the profession) as value. Since a map sorts its keys, the key's operator< must be defined, and it must be sensible to use it. For example, it is generally a bad idea to use pointers for keys, as sorting pointers is something different than sorting the values pointed at by those pointers.

The two fundamental operations on maps are the storage of Key/Value combinations, and the retrieval of values, given their keys. The index operator using a key as the index, can be used for both. If the index operator is used as lvalue, the expression's rvalue is inserted into the map. If it is used as rvalue, the key's associated value is retrieved. Each key can be stored only once in a map. If the same key is entered again, the new value replaces the formerly stored value, which is lost.

A specific key/value combination can implicitly or explicitly be inserted into a map. If explicit insertion is required, the key/value combination must be constructed first. For this, every map defines a value_type which may be used to create values that can be stored in the map. For example, a value for a map<string, int> can be constructed as follows:

    map<string, int>::value_type siValue("Hello", 1);
The value_type is associated with the map<string, int>: the type of the key is string, the type of the value is int. Anonymous value_type objects are also often used. E.g.,
    map<string, int>::value_type("Hello", 1);
Instead of using the line map<string, int>::value_type(...) over and over again, a typedef is frequently used to reduce typing and to improve readability:
    typedef map<string, int>::value_type StringIntValue
Using this typedef, values for the map<string, int> may now be constructed using:
    StringIntValue("Hello", 1);
Alternatively, pairs may be used to represent key/value combinations used by maps:
    pair<string, int>("Hello", 1);

12.3.6.1: The `map' constructors

The following constructors are available for the map container:

12.3.6.2: The `map' operators

The map supports, in addition to the standard operators for containers, the index operator.

The index operator may be used to retrieve or reassign individual elements of the map. The argument of the index operator is called a key.

If the provided key is not available in the map, a new data element is automatically added to the map using the default value or default constructor to initialize the value part of the new element. This default value is returned if the index operator is used as an rvalue.

When initializing a new or reassigning another element of the map, the type of the right-hand side of the assignment operator must be equal to (or promotable to) the type of the map's value part. E.g., to add or change the value of element "two" in a map, the following statement can be used:

    mapsm["two"] = MyClass();

12.3.6.3: The `map' public members

The following member functions are available for the map container:

12.3.6.4: The `map': a simple example

As mentioned at the beginning of section 12.3.6, the map represents a sorted associative array. In a map the keys are sorted. If an application must visit all elements in a map the begin and end iterators must be used.

The following example illustrates how to make a simple table listing all keys and values found in a map:

    #include <iostream>
    #include <iomanip>
    #include <map>

    using namespace std;

    int main()
    {
        pair<string, int>
            pa[] =
            {
                pair<string,int>("one", 10),
                pair<string,int>("two", 20),
                pair<string,int>("three", 30),
            };
        map<string, int>
            object(&pa[0], &pa[3]);

        for
        (
            map<string, int>::iterator it = object.begin();
                it != object.end();
                    ++it
        )
            cout << setw(5) << it->first.c_str() <<
                    setw(5) << it->second << '\n';
    }
    /*
        Generated output:
      one   10
    three   30
      two   20
    */

12.3.7: The `multimap' container

Like the map, the multimap class implements a (sorted) associative array. Before using a multimap container the header file <map> must have been included.

The main difference between the map and the multimap is that the multimap supports multiple values associated with the same key, whereas the map contains single-valued keys. Note that the multimap also accepts multiple identical values associated with identical keys.

The map and the multimap have the same set of member functions, with the exception of the index operator which is not supported with the multimap. This is understandable: if multiple entries of the same key are allowed, which of the possible values should be returned for object[key]?

Refer to section 12.3.6 for an overview of the multimap member functions. Some member functions, however, deserve additional attention when used in the context of the multimap container. These members are discussed below.

Although the functions lower_bound and upper_bound act identically in the map and multimap containers, their operation in a multimap deserves some additional attention. The next example illustrates lower_bound, upper_bound and equal_range applied to a multimap:
    #include <iostream>
    #include <map>
    using namespace std;

    int main()
    {
        pair<string, int> pa[] =
        {
            pair<string,int>("alpha", 1),
            pair<string,int>("bravo", 2),
            pair<string,int>("charley", 3),
            pair<string,int>("bravo", 6),   // unordered `bravo' values
            pair<string,int>("delta", 5),
            pair<string,int>("bravo", 4),
        };
        multimap<string, int> object(&pa[0], &pa[6]);

        typedef multimap<string, int>::iterator msiIterator;

        msiIterator it = object.lower_bound("brava");

        cout << "Lower bound for `brava': " <<
                it->first << ", " << it->second << '\n';

        it = object.upper_bound("bravu");

        cout << "Upper bound for `bravu': " <<
                it->first << ", " << it->second << '\n';

        pair<msiIterator, msiIterator>
            itPair = object.equal_range("bravo");

        cout << "Equal range for `bravo':\n";
        for (it = itPair.first; it != itPair.second; ++it)
            cout << it->first << ", " << it->second << '\n';
        cout << "Upper bound: " << it->first << ", " << it->second << '\n';

        cout << "Equal range for `brav':\n";
        itPair = object.equal_range("brav");
        for (it = itPair.first; it != itPair.second; ++it)
            cout << it->first << ", " << it->second << '\n';
        cout << "Upper bound: " << it->first << ", " << it->second << '\n';
    }
    /*
        Generated output:

        Lower bound for `brava': bravo, 2
        Upper bound for `bravu': charley, 3
        Equal range for `bravo':
        bravo, 2
        bravo, 6
        bravo, 4
        Upper bound: charley, 3
        Equal range for `brav':
        Upper bound: bravo, 2
    */
In particular note the following characteristics:

12.3.8: The `set' container

The set class implements a sorted collection of values. Before using set containers the <set> header file must have been included.

A set contains unique values (of a container-acceptable type). Each value is stored only once.

A specific value can be explicitly created: Every set defines a value_type which may be used to create values that can be stored in the set. For example, a value for a set<string> can be constructed as follows:

    set<string>::value_type setValue("Hello");
The value_type is associated with the set<string>. Anonymous value_type objects are also often used. E.g.,
    set<string>::value_type("Hello");
Instead of using the line set<string>::value_type(...) over and over again, a typedef is often used to reduce typing and to improve readability:
    typedef set<string>::value_type StringSetValue
Using this typedef, values for the set<string> may be constructed as follows:
    StringSetValue("Hello");
Alternatively, values of the set's type may be used immediately. In that case the value of type Type is implicitly converted to a set<Type>::value_type.

The following constructors, operators, and member functions are available for the set container:

12.3.9: The `multiset' container

Like the set, the multiset class implements a sorted collection of values. Before using multiset containers the header file <set> must have been included.

The main difference between the set and the multiset is that the multiset supports multiple entries of the same value, whereas the set contains unique values.

The set and the multiset have the same set of member functions. Refer to section 12.3.8 for an overview of the multiset member functions. Some member functions, however, behave slightly different than their counterparts of the set container. Those members are mentioned here.

Although the functions lower_bound and upper_bound act identically in the set and multiset containers, their operation in a multiset deserves some additional attention. With a multiset container lower_bound and upper_bound produce the same result for non-existing keys: they both return the first element having a key exceeding the provided key.

Here is an example showing the use of various member functions of a multiset:

    #include <iostream>
    #include <set>

    using namespace std;

    int main()
    {
        string
            sa[] =
            {
                "alpha",
                "echo",
                "hotel",
                "mike",
                "romeo"
            };

        multiset<string>
            object(&sa[0], &sa[5]);

        object.insert("echo");
        object.insert("echo");

        multiset<string>::iterator
            it = object.find("echo");

        for (; it != object.end(); ++it)
            cout << *it << " ";
        cout << '\n';

        cout << "Multiset::equal_range(\"ech\")\n";
        pair
        <
            multiset<string>::iterator,
            multiset<string>::iterator
        >
            itpair = object.equal_range("ech");

        if (itpair.first != object.end())
            cout << "lower_bound() points at " << *itpair.first << '\n';
        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << '\n' <<
                object.count("ech") << " occurrences of 'ech'" << '\n';

        cout << "Multiset::equal_range(\"echo\")\n";
        itpair = object.equal_range("echo");

        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << '\n' <<
                object.count("echo") << " occurrences of 'echo'" << '\n';

        cout << "Multiset::equal_range(\"echoo\")\n";
        itpair = object.equal_range("echoo");

        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << '\n' <<
                object.count("echoo") << " occurrences of 'echoo'" << '\n';
    }
    /*
        Generated output:

        echo echo echo hotel mike romeo
        Multiset::equal_range("ech")
        lower_bound() points at echo

        0 occurrences of 'ech'
        Multiset::equal_range("echo")
        echo echo echo
        3 occurrences of 'echo'
        Multiset::equal_range("echoo")

        0 occurrences of 'echoo'
    */

12.3.10: The `stack' container

The stack class implements a stack data structure. Before using stack containers the header file <stack> must have been included.

A stack is also called a first in, last out (FILO or LIFO) data structure as the first item to enter the stack is the last item to leave. A stack is an extremely useful data structure in situations where data must temporarily remain available. For example, programs maintain a stack to store local variables of functions: the lifetime of these variables is determined by the time these functions are active, contrary to global (or static local) variables, which live for as long as the program itself lives. Another example is found in calculators using the Reverse Polish Notation (RPN), in which the operands of operators are kept in a stack, whereas operators pop their operands off the stack and push the results of their work back onto the stack.

As an example of the use of a stack, consider figure 12, in which the contents of the stack is shown while the expression (3 + 4) * 2 is evaluated. In the RPN this expression becomes 3 4 + 2 *, and figure 12 shows the stack contents after each token (i.e., the operands and the operators) is read from the input. Notice that each operand is indeed pushed on the stack, while each operator changes the contents of the stack.

Figure 12 is shown here.
Figure 12: The contents of a stack while evaluating 3 4 + 2 *


The expression is evaluated in five steps. The caret between the tokens in the expressions shown on the first line of figure 12 shows what token has just been read. The next line shows the actual stack-contents, and the final line shows the steps for referential purposes. Note that at step 2, two numbers have been pushed on the stack. The first number (3) is now at the bottom of the stack. Next, in step 3, the + operator is read. The operator pops two operands (so that the stack is empty at that moment), calculates their sum, and pushes the resulting value (7) on the stack. Then, in step 4, the number 2 is read, which is dutifully pushed on the stack again. Finally, in step 5 the final operator * is read, which pops the values 2 and 7 from the stack, computes their product, and pushes the result back on the stack. This result (14) could then be popped to be displayed on some medium.

From figure 12 we see that a stack has one location (the top) where items can be pushed onto and popped off the stack. This top element is the stack's only immediately visible element. It may be accessed and modified directly.

Bearing this model of the stack in mind, let's see what we formally can do with the stack container. For the stack, the following constructors, operators, and member functions are available:

The stack does not support iterators or a subscript operator. The only elements that can be accessed is its top element. To empty a stack:

12.3.11: Unordered containers (`hash tables') (C++0x)

The C++0x standard officially adds hash tables to the language.

Before using hash table containers the header file <unordered_map> must have been included. Variants are hash-based sets and multi-sets. Before using these hash-based set containers the header file <unordered_set> must have been included.

As discussed, the map is a sorted data structure. The keys in maps are sorted using the operator< of the key's data type. Generally, this is not the fastest way to either store or retrieve data. The main benefit of sorting is that a listing of sorted keys appeals more to humans than an unsorted list. However, a by far faster method to store and retrieve data is to use hashing.

Hashing uses a function (called the hash function) to compute an (unsigned) number from the key, which number is thereupon used as an index in the table in which the keys are stored. Retrieval of a key is as simple as computing the hash value of the provided key, and looking in the table at the computed index location: if the key is present, it is stored in the table, and its value can be returned. If it's not present, the key is not stored.

Collisions occur when a computed index position is already occupied by another element. For these situations the abstract containers have solutions available. A simple solution, adopted by the C++0x standard is to use linear chaining which uses a linked list to store colliding table elements in.

In the C++0x standard the term unordered is used rather than hash to avoid name collisions with hash tables developed before the advent of the C++0x standard. Except where unordered is required as part of a type name, we'll use the term hash as it is the term that is commonly encountered.

Four forms of unordered data structures are supported: unordered_map, unordered_multimap, unordered_set, and unordered_multiset.

Below the unordered_map container is discussed. The other containers using hashing also use hashing but provide functionality corresponding to, respectively, the multimap, set and multiset.

Concentrating on the unordered_map, its constructor needs a key type, a value type, an object computing a hash value for the key, and an object comparing two keys for equality. Predefined hash functions are available for std::string keys, and for all standard scalar numeric types (char, short, int etc.). If another data type is used, a hash function object and an equality function object must be made available (see also section 10.10). Examples follow below.

The class implementing the hash function could be called hash. Its function call operator (operator()) returns the (size_t) hash value of the key that it received as its argument.

A generic algorithm (see chapter 19) exists performing tests of equality (i.e., equal_to). These tests can be used if the key's data type supports the equality operator. Alternatively, an overloaded operator== or specialized function object could be constructed returning true if two keys are equal and false otherwise. Examples follow.

The unordered_map class implements an associative array in which the elements are stored according to some hashing scheme.

Constructors, operators and member functions available for the map are also available for the unordered_map. The map and unordered_map support the same set of operators and member functions. However, the efficiency of a unordered_map in terms of speed should greatly exceed the efficiency of the map. Comparable conclusions may be drawn for the unordered_set, unordered_multimap and the unordered_multiset.

Compared to the map container, the unordered_map has an additional constructor:

        unordered_map<...> hash(n);
where n is a size_t value. It is used to construct a unordered_map consisting of an initial number of at least n empty slots to put key/value combinations in. This number is automatically extended when needed.

The hashed key type is almost always text. So, a unordered_map in which the key's data type is a std::string occurs most often. Note that although a char * is allowed as key type this is almost always a bad idea since two char * variables pointing to equal C-strings stored at different locations are considered to represent different keys.

The following program defines a unordered_map containing the names of the months of the year and the number of days these months (usually) have. Then, using the subscript operator the days in several months are displayed. The equality operator used the generic algorithm equal_to<string>, which is the default fourth argument of the unordered_map constructor:

    #include <unordered_map>
    #include <iostream>
    #include <string>
    using namespace std;

    int main()
    {
        unordered_map<string, int> months;

        months["january"] = 31;
        months["february"] = 28;
        months["march"] = 31;
        months["april"] = 30;
        months["may"] = 31;
        months["june"] = 30;
        months["july"] = 31;
        months["august"] = 31;
        months["september"] = 30;
        months["october"] = 31;
        months["november"] = 30;
        months["december"] = 31;

        cout << "september -> " << months["september"] << '\n' <<
                "april     -> " << months["april"] << '\n' <<
                "june      -> " << months["june"] << '\n' <<
                "november  -> " << months["november"] << '\n';
    }
    /*
        Generated output:
    september -> 30
    april     -> 30
    june      -> 30
    november  -> 30
    */

A comparable example, showing the use of explicitly defined hash and equality functions and key-type char const *:

    #include <unordered_map>
    #include <iostream>
    #include <string>
    #include <cstring>
    using namespace std;

    struct EqualCp
    {
        bool operator()(char const *l, char const *r) const
        {
            return strcmp(l, r) == 0;
        }
    };
    struct HashCp
    {
        size_t operator()(char const *str) const
        {
            return hash<std::string const &>()(str);
        }
    };
    int main()
    {
        unordered_map<char const *, int, HashCp, EqualCp> months;

        months["april"] = 30;
        months["november"] = 31;

        string apr("april");    // different pointers, same string

        cout << "april     -> " << months["april"] << '\n' <<
                "april     -> " << months[apr.c_str()] << '\n';
    }

The unordered_multimap, unordered_set and unordered_multiset containers are used analogously. For these containers the equal and hash classes must also be defined. The unordered_multimap also requires the unordered_map header file.

12.3.12: Regular Expressions (C++0x, ?)

The C++0x standard adds handling of regular expressions to the language. Before using regular expressions as offered by the C++ standard the header file <regex> must have been included.

Regular expressions were already available in C++ via its C heritage as C has always offered functions like regcomp and regexec that are used by, e.g., the Pattern class of the Bobcat library.

Regular expressions are extensively documented elsewhere (e.g., regex(7), Friedl, J.E.F Mastering Regular Expressions, O'Reilly) and the reader is referred to these sources for a refresher on the topic of regular expressions.

The C++0x standard adds native object based support for regular expressions by defining several new classes and other facilities. Currently, however, regular expressions are not yet supported by the g++ library and therefore in this section only the basic building blocks the C++0x standard offers to handle regular expressions are mentioned. Once regular expressions actually become available this section will be updated to cover the actually available features.

Eventually, regular expressions are represented by objects of the class regex. Once a regex regular expression object has been defined its member regex_search can be called to process its regular expression. This function expects arguments representing, respectively, the text which must be matched against the regular expression; an object of the class cmatch representing the results of the matching operation and an object of the class regex representing the used regular expression. Furthermore, a member regex_replace is available performing textual replacements based on regular expressions.

Regular expressions using the regex class are currently not yet available in the g++ library.

12.4: The `complex' container

The complex container defines the standard operations that can be performed on complex numbers. Before using complex containers the header file <complex> must have been included.

The complex number's real and imaginary types are specified as the container's data type. Examples:

    complex<double>
    complex<int>
    complex<float>
Note that the real and imaginary parts of complex numbers have the same datatypes.

When initializing (or assigning) a complex object, the imaginary part may be omitted from the initialization or assignment resulting in its value being 0 (zero). By default, both parts are zero.

Below it is silently assumed that the used complex type is complex<double>. Given this assumption, complex numbers may be initialized as follows:

Anonymous complex values may also be used. In the next example two anonymous complex values are pushed on a stack of complex numbers, to be popped again thereafter:
    #include <iostream>
    #include <complex>
    #include <stack>

    using namespace std;

    int main()
    {
        stack<complex<double>>
            cstack;

        cstack.push(complex<double>(3.14, 2.71));
        cstack.push(complex<double>(-3.14, -2.71));

        while (cstack.size())
        {
            cout << cstack.top().real() << ", " <<
                    cstack.top().imag() << "i" << '\n';
            cstack.pop();
        }
    }
    /*
        Generated output:
    -3.14, -2.71i
    3.14, 2.71i
    */

The following member functions and operators are defined for complex numbers (below, value may be either a primitve scalar type or a complex object):

12.5: Unrestricted Unions (C++0x, 4.6)

We end this chapter on abstract containers with a small detour, introducing additions to the union concept, made available by the C++0x standard. Although unions should not themselves be considered abstract containers, having covered these containers has put us in a good position to introduce and illustrate these new additions.

The C++0x standard adds unrestricted unions to C++'s data structuring capabilities. Whereas the traditional union can only contain primitive data, unrestricted unions allow data fields of types for which non-trivial constructors have been defined. Such data fields commonly are of class-types. Here is an example of such an unrestricted union:

    union Union
    {
        int u_int;
        std::complex<double, double> u_complex;
        std::string u_string;
    };
Two of the three fields of this union have non-trivial constructors, turning this union in an unrestricted union. As an unrestricted union defines at least one field of a type having a non-trivial constructor the question becomes how these unions can be constructed and destroyed.

The destructor of a union consisting of, e.g. a std::string and a double should of course not call the string's destructor if the union's last (or only) use referred to its double field. Likewise, when the std::string field is being used, but a switch is made from the std::string to the double field the std::string's destructor should be called before any assignment to the double field.

These tasks are too difficult for the compiler to solve, and the compiler will therefore not implement default constructors and destructors for unrestricted unions, leaving the implementations of the union's constructors and destructor to the software engineer. If we try to define an unrestricted union like the above one using its default constructor we see an error message like the following:

    error: use of deleted function 'Union2::Union2()'
    error: 'Union::Union()' is implicitly deleted because the default
            definition would be ill-formed:
    error: union member 'Union::u_string' with non-trivial
            'std::basic_string<...>::basic_string() ...'

Although the compiler is unable to provide the appropriate member functions, we can, as we know in what context the union is used. So let's create some constructors, where the various constructors each pick a field to initialize:

    Union::Union(int i)
    :
        u_int(i)
    {}
    Union::Union(double real, double imaginary)
    :
        u_complex(real, imaginary)
    {}
    Union::Union(std::string const &str)
    :
        u_string(str)
    {}
But like the constructor, the compiler doesn't implement a destructor either: too complex for the compiler to determine what the last used field was and have the unrestricted union's destructor do its thing. Like the constructors we must implement the unrestricted union's destructor ourselves.

The destructor should destroy u_string's data if that is its currently active field; u_complex's data if that is its currently active field and do nothing if u_int is its currently active field. But how would the destructor know this? It doesn't the way we've set up things as there is no information within the union about the currently used field.

Here is one way to solve this problem:

Assume we provide each field with a tag that is unique for its field. Conceptually this is easily done by prefixing each field with an int tag. Since we're using unions the tags of the fields would coincide and a destructor could simply inspect the tags to find out which field is being used. The tag-fields must be parts of the data fields themselves.

The std::pair containers can be used to implement this scheme, using their first data members as int tags, and their seond data members as the data types proper. Here are the definitions of the union's data fields and their constructors:

    union Union
    {
        std::pair<int, int> u_int;
        std::pair<int, std::complex<double>> u_complex;
        std::pair<int, std::string> u_string;

        // member declarations here
    };
    Union::Union(int i)
    :
        u_int(1, i)
    {}
    Union::Union(double real, double imaginary)
    :
        u_complex(2, {real, imaginary})
    {}
    Union::Union(std::string const &str)
    :
        u_string(3, str)
    {}

Now for the destructor: the destructor should call the appropriate destructor of the currently active data fields having non-trivial destructors. So, in our Union union the u_int field can be ignored. The destructor now has an easy job: depending on the (any) field's first data member it picks the right destructor. Since the data fields weren't dynamically allocated, we don't use delete. Instead we use the procedure previously encountered with placement new, explicitly calling the field's destructor where needed:

    Union::~Union()
    {
        switch (u_int.first)
        {
            case 2:
                u_complex.second.~complex<double>();
            break;

            case 3:
                u_string.second.~string();
            break;
        }
    }

What about changing fields? Once an unrestricted union has received its value it keeps that variant until used otherwise. That's in line with the traditional union. But to change the interpretation of an unrestricted union's class-type field we must make sure that the destructor of that class type field is first called. To do that smoothly we need operator=.

Once again we encounter a complication: unrestricted unions don't even have a default copy constructor. That's again understandable: which field should be initialized by the copy constructor? But as we've implemented a means of determining the currently active field the problem is easily solved, once we realize that the memory in the destination object is waiting to be used and that placement new is the tool to perform the initialization with. Here is Union's copy constructor:

    Union::Union(Union const &other)
    {
        switch (other.u_int.first)
        {
            case 1:
                new (&u_int) std::pair<int, int>(other.u_int);
            break;
            case 2:
                new (&u_complex)
                        std::pair<int, complex<double>>(other.u_complex);
            break;
            case 3:
                new (&u_string) std::pair<int, string>(other.u_string);
            break;
        }
    }

Preparing for operator= we first implement Union::swap, swapping the current and another Union object, using fast swapping as discussed in section 8.6.1.1:

    void Union::swap(Union &other)
    {
        std::swap(u_int,     other.u_int);
        std::swap(u_complex, other.u_complex);
        std::swap(u_string,  other.u_string);
    }
If fast swapping cannot be used, but individual fields can be swapped, then swapping is a bit complex. In that case:

The operator= member is now implemented using the move-assignment operator that can be implemented en-passant, whether the Union supports moving or not:

        Union &Union::operator=(Union &&tmp)
        {
            swap(tmp);
            return *this;
        }

        Union &Union::operator=(Union const &other)
        {
            Union tmp(other);
            return *this = std::move(tmp);
        }

Using simple accessor members to access the field's second data members like the one shown below, a program can now define, copy construct and reassign Unions, and access the active data using accessors:

        std::string &Union::asString()
        {
            return u_string.second;
        }

    int main()
        {
            Union ustr("hello world");
            Union ucom(12.4, 12.5);

            ustr = ucom;
            ustr = Union("hi there, again");
            cout << ustr.asString() << '\n';
        }