Chapter 23: Concrete Examples Of C++

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.

Please state the document version you're referring to, as found in the title (in this document: 8.1.0~pre2) and please state chapter and paragraph name or number you're referring to.

All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

In this chapter several concrete examples of C++ programs, classes and templates will be presented. Topics covered by this document such as virtual functions, static members, etc. are illustrated in this chapter. The examples roughly follow the organization of earlier chapters.

First, examples using stream classes are presented, including some detailed examples illustrating polymorphism. With the advent of the ANSI/ISO standard, classes supporting streams based on file descriptors are no longer available, including the Gnu procbuf extension. These classes were frequently used in older C++ programs. This section of the C++ Annotations develops an alternative: classes extending streambuf, allowing the use of file descriptors, and classes around the fork() system call.

Finally, we'll touch the subjects of scanner and parser generators, and show how these tools may be used in C++ programs. These final examples assume a certain familiarity with the concepts underlying these tools, like grammars, parse-trees and parse-tree decoration. Once the input for a program exceeds a certain level of complexity, it's advantageous to use scanner- and parser-generators to produce code doing the actual input recognition. One of the examples in this chapter describes the usage of these tools in a C++ environment.

23.1: Distinguishing lvalues from rvalues with operator[]()

(ISN, see concrete/lvalues/lvalues.cc)

23.2: Using file descriptors with `streambuf' classes

23.2.1: Classes for output operations

Extensions to the ANSI/ISO standard may be available allowing us to read from and/or write to file descriptors. However, such extensions are not standard, and may thus vary or be unavailable across compilers and/or compiler versions. On the other hand, a file descriptor can be considered a device. So it seems natural to use the class streambuf as the starting point for constructing classes interfacing file descriptors.

In this section we will construct classes which may be used to write to a device identified by a file descriptor: it may be a file, but it could also be a pipe or socket. Section 23.2.2 discusses reading from devices given their file descriptors, while section 23.4.1 reconsiders redirection, discussed earlier in section 6.6.1.

Basically, deriving a class for output operations is simple. The only member function that must be overridden is the virtual member int overflow(int c) . This member is responsible for writing characters to the device once the class's buffer is full. If fd is a file descriptor to which information may be written, and if we decide against using a buffer then the member overflow() can simply be:

    class UnbufferedFD: public std::streambuf
    {
        public:
            int overflow(int c);
            ...
    };

    int UnbufferedFD::overflow(int c)
    {
        if (c != EOF)
        {
            if (write(d_fd, &c, 1) != 1)
                return EOF;
        }
        return c;
    }
The argument received by overflow() is either written as a value of type char to the file descriptor, or EOF is returned.

This simple function does not use an output buffer. As the use of a buffer is strongly advised (see also the next section), the construction of a class using an output buffer will be discussed next in somewhat greater detail.

When an output buffer is used, the overflow() member will be a bit more complex, as it is now only called when the buffer is full. Once the buffer is full, we first have to flush the buffer, for which the (virtual) function streambuf::sync() is available. Since sync() is a virtual function, classes derived from std::streambuf may redefine sync() to flush a buffer std::streambuf itself doesn't know about.

Overriding sync() and using it in overflow() is not all that has to be done: eventually we might have less information than fits into the buffer. So, at the end of the lifetime of our special streambuf object, its buffer might only be partially full. Therefore, we must make sure that the buffer is flushed once our object goes out of scope. This is of course very simple: sync() should be called by the destructor as well.

Now that we've considered the consequences of using an output buffer, we're almost ready to construct our derived class. We will add a couple of additional features, though.

In order to save some space, the successful operation of the various functions was not checked. In `real life' implementations these checks should of course not be omitted. Our class ofdnstreambuf has the following characteristics: Depending on the number of arguments, the following program uses the ofdstreambuf class to copy its standard input to file descriptor STDOUT_FILENO, which is the symbolic name of the file descriptor used for the standard output. Here is the program:
    #include <string>
    #include <iostream>
    #include <istream>
    #include "fdout.h"
    using namespace std;

    int main(int argc)
    {
        ofdnstreambuf   fds(STDOUT_FILENO, 500);
        ostream         os(&fds);

        switch (argc)
        {
            case 1:
                os << "COPYING cin LINE BY LINE\n";
                for (string  s; getline(cin, s); )
                    os << s << endl;
            break;

            case 2:
                os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n";

                cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;
            break;

            case 3:
                os << "COPYING cin BY INSERTING cin.rdbuf() into os\n";
                os << cin.rdbuf();
            break;
        }
    }

23.2.2: Classes for input operations

When classes to be used for input operation are derived from std::streambuf, they should be provided with an input buffer of at least one character. The one-character input buffer allows for the use of the member functions istream::putback() or istream::ungetc(). Stream classes (like istream) normally allow us to unget at least one character using their member functions putback() or ungetc(). This is important, as these stream classes usually interface to streambuf objects. Although it is strictly speaking not necessary to implement a buffer in classes derived from streambuf using buffers in these cases is strongly advised: the implementation is very simple and straightforward, and the applicability of such classes will be greatly improved. Therefore, in all our classes derived from the class streambuf at least a buffer of one character will be defined.

23.2.2.1: Using a one-character buffer

When deriving a class (e.g., ifdstreambuf) from streambuf using a buffer of one character, at least its member streambuf::underflow() should be overridden, as this is the member to which all requests for input are eventually directed. Since a buffer is also needed, the member streambuf::setg() is used to inform the streambuf base class of the size of the input buffer, so that it is able to set up its input buffer pointers correctly. This will ensure that eback(), gptr(), and egptr() return correct values.

The required class shows the following characteristics:

This completes the construction of the ifdstreambuf class. It is used in the following program:
    #include <iostream>
    #include <istream>
    #include <unistd.h>
    #include "ifdbuf.h"
    using namespace std;

    int main(int argc)
    {
        ifdstreambuf fds(STDIN_FILENO);
        istream      is(&fds);

        cout << is.rdbuf();
    }

23.2.2.2: Using an n-character buffer

How complex would things get if we would decide to use a buffer of substantial size? Not that complex. The following class allows us to specify the size of a buffer, but apart from that it is basically the same class as ifdstreambuf developed in the previous section. To make things a bit more interesting, in the class ifdnstreambuf developed here, the member streambuf::xsgetn() is also overridden, to optimize reading of series of characters. Furthermore, a default constructor is provided which can be used in combination with the open() member to construct an istream object before the file descriptor becomes available. Then, once the descriptor becomes available, the open() member can be used to initiate the object's buffer. Later, in section 23.4, we'll encounter such a situation.

To save some space, the success of various calls was not checked. In `real life' implementations, these checks should, of course, not be omitted. The class ifdnstreambuf has the following characteristics:

The member function xsgetn() is called by streambuf::sgetn(), which is a streambuf member. The following example illustrates the use of this member function with a ifdnstreambuf object:
    #include <unistd.h>
    #include <iostream>
    #include <istream>
    #include "ifdnbuf.h"
    using namespace std;

    int main(int argc)
    {
                                    // internally: 30 char buffer
        ifdnstreambuf fds(STDIN_FILENO, 30);

        char buf[80];               // main() reads blocks of 80
                                    // chars
        while (true)
        {
            size_t n = fds.sgetn(buf, 80);
            if (n == 0)
                break;
            cout.write(buf, n);
        }
    }

23.2.2.3: Seeking positions in `streambuf' objects

When devices support seek operations, classes derived from streambuf should override the members streambuf::seekoff() and streambuf::seekpos(). The class ifdseek, developed in this section, can be used to read information from devices supporting such seek operations. The class ifdseek was derived from ifdstreambuf, so it uses a character buffer of just one character. The facilities to perform seek operations, which are added to our new class ifdseek, will make sure that the input buffer is reset when a seek operation is requested. The class could also be derived from the class ifdnstreambuf; in which case, the arguments to reset the input buffer must be adapted in such a way that its second and third parameters point beyond the available input buffer. Let's have a look at the characteristics of ifdseek: An example of a program using the class ifdseek is the following. If this program is given its own source file using input redirection then seeking is supported, and with the exception of the first line, every other line is shown twice:
    #include "fdinseek.h"
    #include <string>
    #include <iostream>
    #include <istream>
    #include <iomanip>
    using namespace std;

    int main(int argc)
    {
        ifdseek fds(0);
        istream is(&fds);
        string  s;

        while (true)
        {
            if (!getline(is, s))
                break;

            streampos pos = is.tellg();

            cout << setw(5) << pos << ": `" << s << "'\n";

            if (!getline(is, s))
                break;

            streampos pos2 = is.tellg();

            cout << setw(5) << pos2 << ": `" << s << "'\n";

            if (!is.seekg(pos))
            {
                cout << "Seek failed\n";
                break;
            }
        }
    }

23.2.2.4: Multiple `unget()' calls in `streambuf' objects

As mentioned before, streambuf classes and classes derived from streambuf should support at least ungetting the last read character. Special care must be taken when series of unget() calls must be supported. In this section the construction of a class supporting a configurable number of istream::unget() or istream::putback() calls is discussed.

Support for multiple (say `n') unget() calls is implemented by reserving an initial section of the input buffer, which is gradually filled up to contain the last n characters read. The class was implemented as follows:

The following program illustrates the class fdunget. It reads at most 10 characters from the standard input, stopping at EOF. A guaranteed unget-buffer of 2 characters is defined in a buffer holding 3 characters. Just before reading a character, the program tries to unget at most 6 characters. This is, of course, not possible; but the program will nicely unget as many characters as possible, considering the actual number of characters read:
    #include "fdunget.h"
    #include <string>
    #include <iostream>
    #include <istream>
    using namespace std;

    int main(int argc)
    {
        fdunget fds(0, 3, 2);
        istream is(&fds);
        char    c;

        for (int idx = 0; idx < 10; ++idx)
        {
            cout << "after reading " << idx << " characters:\n";
            for (int ug = 0; ug <= 6; ++ug)
            {
                if (!is.unget())
                {
                    cout
                    << "\tunget failed at attempt " << (ug + 1) << "\n"
                    << "\trereading: '";

                    is.clear();
                    while (ug--)
                    {
                        is.get(c);
                        cout << c;
                    }
                    cout << "'\n";
                    break;
                }
            }

            if (!is.get(c))
            {
                cout << " reached\n";
                break;
            }
            cout << "Next character: " << c << endl;
        }
    }
    /*
        Generated output after 'echo abcde | program':

        after reading 0 characters:
                unget failed at attempt 1
                rereading: ''
        Next character: a
        after reading 1 characters:
                unget failed at attempt 2
                rereading: 'a'
        Next character: b
        after reading 2 characters:
                unget failed at attempt 3
                rereading: 'ab'
        Next character: c
        after reading 3 characters:
                unget failed at attempt 4
                rereading: 'abc'
        Next character: d
        after reading 4 characters:
                unget failed at attempt 4
                rereading: 'bcd'
        Next character: e
        after reading 5 characters:
                unget failed at attempt 4
                rereading: 'cde'
        Next character:

        after reading 6 characters:
                unget failed at attempt 4
                rereading: 'de
        '
         reached
    */

23.3: Fixed-sized field extraction from istream objects

Usually when extracting information from istream objects operator>>, the standard extraction operator, is perfectly suited for the task as in most cases the extracted fields are white-space or otherwise clearly separated from each other. But this does not hold true in all situations. For example, when a web-form is posted to some processing script or program, the receiving program may receive the form field's values as url-encoded characters: letters and digits are sent unaltered, blanks are sent as + characters, and all other characters start with % followed by the character's ascii-value represented by its two digit hexadecimal value.

When decoding url-encoded information, a simple hexadecimal extraction won't work, since that will extract as many hexadecimal characters as available, instead of just two. Since the letters a-f and 0-9 are legal hexadecimal characters, a text like My name is `Ed', url-encoded as

        My+name+is+%60Ed%27
will result in the extraction of the hexadecimal values 60ed and 27, instead of 60 and 27. The name Ed will disappear from view, which is clearly not what we want.

In this case, having seen the %, we could extract 2 characters, put them in an istringstream object, and extract the hexadecimal value from the istringstream object. A bit cumbersome, but doable. Other approaches, however, are possible as well.

The following class fistream for fixed-sized field istream defines an istream class supporting both fixed-sized field extractions and blank-delimited extractions (as well as unformatted read() calls). The class may be initialized as a wrapper around an existing istream, or it can be initialized using the name of an existing file. The class is derived from istream, allowing all extractions and operations supported by istreams in general. The class will need the following data members:

Here is the initial section of fistream's class interface:
    class fistream: public std::istream
    {
        std::auto_ptr<std::filebuf> d_filebuf;
        std::streambuf *d_streambuf;
        std::istringstream d_iss;
        size_t d_width;

As mentioned, fistream objects can be constructed from either a filename or an existing istream object. Thus, the class interface shows two constructors:

            fistream(std::istream &stream);
            fistream(char const *name,
                std::ios::openmode mode = std::ios::in);

When an fistream object is constructed using an existing istream object, the fistream's istream part will simply use the stream's streambuf object:

fistream::fistream(istream &stream)
:
    istream(stream.rdbuf()),
    d_streambuf(rdbuf()),
    d_width(0)
{}

When an fstream object is constructed using a filename, the istream base initializer is given a new filebuf object to be used as its streambuf. Since the class's data members are not initialized before the class's base class has been constructed, d_filebuf can only be initialized thereafter. By then, the filebuf is only available as rdbuf(), which returns a streambuf. However, as it is actually a filebuf, a reinterpret_cast is used to cast the streambuf pointer returned by rdbuf() to a filebuf *, so d_filebuf can be initialized:

fistream::fistream(char const *name, ios::openmode mode)
:
    istream(new filebuf()),
    d_filebuf(reinterpret_cast<filebuf *>(rdbuf())),
    d_streambuf(d_filebuf.get()),
    d_width(0)
{
    d_filebuf->open(name, mode);
}

There is only one additional public member: setField(field const &). This member is used to define the size of the next field to extract. Its parameter is a reference to a field class, a manipulator class defining the width of the next field.

Since a field & is mentioned in fistream's interface, field must be declared before fistream's interface starts. The class field itself is simple: it declares fistream as its friend, and it has two data members: d_width specifies the width of the next field, d_newWidth is set to true if d_width's value should actually be used. If d_newWidth is false, fistream will return to its standard extraction mode. The class field furthermore has two constructors: a default constructor, setting d_newWidth to false and a second constructor expecting the width of the next field to extract as its value. Here is the class field:

    class field
    {
        friend class fistream;
        size_t d_width;
        bool     d_newWidth;

        public:
            field(size_t width);
            field();
    };

    inline field::field(size_t width)
    :
        d_width(width),
        d_newWidth(true)
    {}

    inline field::field()
    :
        d_newWidth(false)
    {}

Since field declares fistream as its friend, setField may inspect field's members directly.

Time to return to setField(). This function expects a reference to a field object, initialized in either of three different ways:

Here is setField()'s implementation:
std::istream &fistream::setField(field const &params)
{
    if (params.d_newWidth)                  // new field size requested
        d_width = params.d_width;           // set new width

    if (!d_width)                           // no width?
        rdbuf(d_streambuf);                 // return to the old buffer
    else
        setBuffer();                        // define the extraction buffer

    return *this;
}

The private member setBuffer() defines a buffer of d_width + 1 characters, and uses read() to fill the buffer with d_width characters. The buffer is terminated by an ASCII-Z character. This buffer is then used to initialize the d_str member. Finally, fistream's rdbuf() member is used to extract the d_str's data via the fistream object itself:

void fistream::setBuffer()
{
    char *buffer = new char[d_width + 1];

    rdbuf(d_streambuf);                     // use istream's buffer to
    buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars,
                                                // terminated by ascii-Z
    d_iss.str(buffer);
    delete buffer;

    rdbuf(d_iss.rdbuf());                   // switch buffers
}

Although setField() could be used to configure fistream to use or not to use fixed-sized field extraction using manipulators is probably preferable. To allow field objects to be used as manipulators, an overloaded extraction operator was defined, accepting an istream & and a field const & object. Using this extraction operator, statements like

fis >> field(2) >> x >> field(0);
are possible (assuming fis is a fistream object). Here is the overloaded operator>>, as well as its declaration:
istream &std::operator>>(istream &str, field const &params)
{
    return reinterpret_cast<fistream *>(&str)->setField(params);
}

Declaration:

namespace std
{
    istream &operator>>(istream &str, FBB::field const &params);
}

Finally, an example. The following program uses a fistream object to url-decode url-encoded information appearing at its standard input:

    int main()
    {
        fistream fis(cin);

        fis >> hex;
        while (true)
        {
            size_t x;
            switch (x = fis.get())
            {
                case '\n':
                    cout << endl;
                break;
                case '+':
                    cout << ' ';
                break;
                case '%':
                    fis >> field(2) >> x >> field(0);
                // FALLING THROUGH
                default:
                    cout << static_cast<char>(x);
                break;
                case EOF:
                return 0;
            }
        }
    }
    /*
        Generated output after:
            echo My+name+is+%60Ed%27 | a.out

        My name is `Ed'
    */

23.4: The `fork()' system call

From the C programming language, the fork() system call is well known. When a program needs to start a new process, system() can be used, but this requires the program to wait for the child process to terminate. The more general way to spawn subprocesses is to call fork().

In this section we will see how C++ can be used to wrap classes around a complex system call like fork(). Much of what follows in this section directly applies to the Unix operating system, and the discussion will therefore focus on that operating system. However, other systems usually provide comparable facilities. The following discussion is based heavily on the notion of design patterns (cf. Gamma et al. (1995) Design Patterns, Addison-Wesley)

When fork() is called, the current program is duplicated in memory, thus creating a new process, and both processes continue their execution just below the fork() system call. The two processes may, however, inspect the return value of fork(): the return value in the original process (called the parent process) differs from the return value in the newly created process (called the child process):

A basic Fork class should hide all bookkeeping details of a system call like fork() from its users. The class Fork developed here will do just that. The class itself only needs to take care of the proper execution of the fork() system call. Normally, fork() is called to start a child process, usually boiling down to the execution of a separate process. This child process may expect input at its standard input stream and/or may generate output to its standard output and/or standard error streams. Fork does not know all this, and does not have to know what the child process will do. However, Fork objects should be able to activate their child processes.

Unfortunately, Fork's constructor cannot know what actions its child process should perform. Similarly, it cannot know what actions the parent process should perform. For this particular situation, the template method design pattern was developed. According to Gamma c.s., the template method design pattern

``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. (The) Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''

This design pattern allows us to define an abstract base class already providing the essential steps related to the fork() system call and deferring the implementation of certain normally used parts of the fork() system call to subclasses.

The Fork abstract base class itself has the following characteristics:

The member function fork() calls the system function fork() (Caution: since the system function fork() is called by a member function having the same name, the :: scope resolution operator must be used to prevent a recursive call of the member function itself). After calling ::fork(), depending on its return value, either parentProcess() or childProcess() is called. Maybe redirection is necessary. Fork::fork()'s implementation calls childRedirections() just before calling childProcess(), and parentRedirections() just before calling parentProcess():
    #include "fork.ih"

    void Fork::fork()
    {
        if ((d_pid = ::fork()) < 0)
            throw "Fork::fork() failed";

        if (d_pid == 0)                 // childprocess has pid == 0
        {
            childRedirections();
            childProcess();

            exit(1);                    // we shouldn't come here:
                                        // childProcess() should exit
        }

        parentRedirections();
        parentProcess();
    }
In fork.cc the class's internal header file fork.ih is included. This header file takes care of the inclusion of the necessary system header files, as well as the inclusion of fork.h itself. Its implementation is:
    #include "fork.h"
    #include <cstdlib>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>

Child processes should not return: once they have completed their tasks, they should terminate. This happens automatically when the child process performs a call to a member of the exec...() family, but if the child itself remains active, then it must make sure that it terminates properly. A child process normally uses exit() to terminate itself, but note that exit() prevents the activation of destructors of objects defined at the same or more superficial nesting levels than the level at which exit() is called. Destructors of globally defined objects are activated when exit() is used. When using exit() to terminate childProcess(), it should either itself call a support member function defining all nested objects it needs, or it should define all its objects in a compound statement (e.g., using a throw block) calling exit() beyond the compound statement.

Parent processes should normally wait for their children to complete. The terminating child processes inform their parent that they are about to terminate by sending out a signal which should be caught by their parents. If child processes terminate and their parent processes do not catch those signal then such child processes remain visible as so-called zombie processes.

If parent processes must wait for their children to complete, they may call the member waitForChild(). This member returns the exit status of a child process to its parent.

There exists a situation where the child process continues to live, but the parent dies. In nature this happens all the time: parents tend to die before their children do. In our context (i.e. C++), this is called a daemon program: the parent process dies and the child program continues to run as a child of the basic init process. Again, when the child eventually dies a signal is sent to its `step-parent' init. No zombie is created here, as init catches the termination signals of all its (step-) children. The construction of a daemon process is very simple, given the availability of the class Fork (cf. section 23.4.2).

23.4.1: Redirection revisited

Earlier, in section 6.6.1, it was noted that within a C++ program, streams could be redirected using the ios::rdbuf() member function. By assigning the streambuf of a stream to another stream, both stream objects access the same streambuf, thus implementing redirection at the level of the programming language itself.

Note that this is fine within the context of the C++ program, but if that context is left, the redirection terminates, as the operating system does not know about streambuf objects. This happens, e.g., when a program uses a system() call to start a subprogram. The program at the end of this section uses C++ redirection to redirect the information inserted into cout to a file, and then calls

    system("echo hello world")
to echo a well-known line of text. Since echo writes its information to the standard output, this would be the program's redirected file if C++'s redirection would be recognized by the operating system.

Actually, this doesn't happen; and hello world still appears at the program's standard output instead of the redirected file. A solution of this problem involves redirection at the operating system level, for which some operating systems (e.g., Unix and friends) provide system calls like dup() and dup2(). Examples of these system calls are given in section 23.4.3.

Here is the example of the failing redirection at the system level following C++ redirection using streambuf redirection:

    #include <iostream>
    #include <fstream>
    #include <cstdlib>
    using namespace::std;

    int main()
    {
        ofstream of("outfile");

        cout.rdbuf(of.rdbuf());
        cout << "To the of stream" << endl;
        system("echo hello world");
        cout << "To the of stream" << endl;
    }
    /*
        Generated output: on the file `outfile'

        To the of stream
        To the of stream

        On standard output:

        hello world
    */

23.4.2: The `Daemon' program

Applications exist in which the only purpose of fork() is to start a child process. The parent process terminates immediately after spawning the child process. If this happens, the child process continues to run as a child process of init, the always running first process on Unix systems. Such a process is often called a daemon, running as a background process.

Although the following example can easily be constructed as a plain C program, it was included in the C++ Annotations because it is so closely related to the current discussion of the Fork class. I thought about adding a daemon() member to that class, but eventually decided against it because the construction of a daemon program is very simple and requires no features other than those currently offered by the class Fork. Here is an example illustrating the construction of a daemon program:

    #include <iostream>
    #include <unistd.h>
    #include "fork.h"

    class Daemon: public Fork
    {
        public:
            virtual void parentProcess()        // the parent does nothing.
            {}

            virtual void childProcess()
            {
                sleep(3);                       // actions taken by the child
                                                // just a message...
                std::cout << "Hello from the child process\n";
                exit (0);                       // The child process exits.
            }
    };

    int main()
    {
        Daemon daemon;

        daemon.fork();          // program immediately returns
        return 0;
    }

    /*
        Generated output:
    The next command prompt, then after 3 seconds:
    Hello from the child process
    */

23.4.3: The class `Pipe'

Redirection at the system level involves the use of file descriptors, created by the pipe() system call. When two processes want to communicate using such file descriptors, the following takes place: Though basically simple, errors may easily creep in: purposes of file descriptors available to the two processes (child- or parent-) may easily get mixed up. To prevent bookkeeping errors, the bookkeeping may be properly set up once, to be hidden therafter inside a class like the Pipe class constructed here. Let's have a look at its characteristics (before the implementations can be compiled, the compiler must have read the class's header file as well as the file unistd.h): Now that redirection can be configured easily using one or more Pipe objects, we'll now use Fork and Pipe in several demonstration programs.

23.4.4: The class `ParentSlurp'

The class ParentSlurp, derived from Fork, starts a child process which execs a program (like /bin/ls). The (standard) output of the execed program is then read by the parent process. The parent process will (for demonstration purposes) write the lines it receives to its standard output stream, while prepending linenumbers to the received lines. It is most convenient here to redirect the parents standard input stream, so that the parent can read the output from the child process from its std::cin input stream. Therefore, the only pipe that's used is used as an input pipe at the parent, and an output pipe at the child.

The class ParentSlurp has the following characteristics:

The following program simply constructs a ParentSlurp object, and calls its fork() member. Its output consists of a numbered list of files in the directory where the program is started. Note that the program also needs the fork.o, pipe.o and waitforchild.o object files (see earlier sources):
    int main()
    {
        ParentSlurp ps;

        ps.fork();
        return 0;
    }
    /*
        Generated Output (example only, actually obtained output may differ):

        1: a.out
        2: bitand.h
        3: bitfunctional
        4: bitnot.h
        5: daemon.cc
        6: fdinseek.cc
        7: fdinseek.h
        ...
    */

23.4.5: Communicating with multiple children

The next step up the ladder is the construction of a child-process monitor. Here, the parent process is responsible for all its child processes, but it also must read their standard output. The user may enter information at the parent process' standard input, for which a simple command language is defined: Furthermore, the child process that hasn't received text for some time will complain, by sending a message to the parent-process. The parent process will then simply transmit the received message to the user, by copying it to the standard output stream.

A problem with programs like our monitor is that these programs allow asynchronous input from multiple sources: input may appear at the standard input as well as at the input-sides of pipes. Also, multiple output channels are used. To handle situations like these, the select() system call was developed.

23.4.5.1: The class `Select'

The select() system call was developed to handle asynchronous I/O multiplexing. This system call can be used to handle, e.g., input appearing simultaneously at a set of file descriptors.

The select() system function is rather complex, and its full discussion is beyond the C++ Annotations' scope. However, its use may be simplified by providing a class Selector, hiding its details and offering an easy-to-use public interface. Here its characteristics are discussed:

The following member functions are part of the class's public interface: The class's remaining (two) members are support members, and should not be used by non-member functions. Therefore, they should be declared in the class's private section:

23.4.5.2: The class `Monitor'

The monitor program uses a Monitor object to do most of the work. The class has only one public constructor and one public member, run(), to perform its tasks. Therefore, all other member functions described below should be declared in the class's private section.

Monitor defines the private enum Commands, symbolically listing the various commands its input language supports, as well as several data members, among which a Selector object and a map using child order numbers as its keys, and pointer to Child objects (see section 23.4.5.3) as its values. Furthermore, Monitor has a static array member s_handler[], storing pointers to member functions handling user commands.

A destructor should have been implemented too, but its implementation is left as an exercise to the reader. Before the class interface can be processed by the compiler, it must have seen select.h and child.h. Here is the class header, including the interface of the nested function object class Find:

    class Monitor
    {
        enum Commands
        {
            UNKNOWN,
            START,
            EXIT,
            STOP,
            TEXT,
            sizeofCommands
        };

        class Find
        {
            int     d_nr;
            public:
                Find(int nr);
                bool operator()(std::map<int, Child *>::value_type &vt)
                                                                  const;
        };

        Selector                d_selector;
        int                     d_nr;
        std::map<int, Child *>  d_child;

        static void (Monitor::*s_handler[])(int, std::string const &);

        public:
            enum Done
            {};

            Monitor();
            void run();

        private:
            static void killChild(std::map<int, Child *>::value_type it);
            static void initialize();

            Commands    next(int *value, std::string *line);
            void    processInput();
            void    processChild(int fd);

            void    createNewChild(int, std::string const &);
            void    exiting(int = 0, std::string const &msg = std::string());
            void    sendChild(int value, std::string const &line);
            void    stopChild(int value, std::string const &);
            void    unknown(int, std::string const &);
    };

Since there's only one non-class type data member, the class's constructor remains very short and could be implemented inline. However, the array s_handler, storing pointers to functions needs to be initialized as well. This can be accomplished in several ways:

Since the initialization function immediately returns if the initialization has already been performed, Monitor's constructor may call the initialization and still defensibly be implemented inline:
    inline Monitor::Monitor()
    :
        d_nr(0)
    {
        initialize();
    }

The core of Monitor's activities are performed by run(). It performs the following tasks:

Here is run()'s implementation:
    #include "monitor.ih"

    void Monitor::run()
    {
        d_selector.addReadFd(STDIN_FILENO);

        while (true)
        {
            cout << "? " << flush;
            try
            {
                d_selector.wait();

                int fd;
                while ((fd = d_selector.readFd()) != -1)
                {
                    if (fd == STDIN_FILENO)
                        processInput();
                    else
                        processChild(fd);
                }
                cout << "NEXT ...\n";

            }
            catch (char const *msg)
            {
                exiting(1, msg);
            }
        }
    }

The member function processInput() reads the commands entered by the user via the program's standard input stream. The member itself is rather simple: it calls next() to obtain the next command entered by the user, and then calls the corresponding function using the matching element of the s_handler[] array. The members processInput() and next() were defined as follows:

    void Monitor::processInput()
    {
        string line;
        int    value;
        Commands cmd = next(&value, &line);
        (this->*s_handler[cmd])(value, line);
    }

    Monitor::Commands Monitor::next(int *value, string *line)
    {
        if (!getline(cin, *line))
            exiting(1, "Command::next(): reading cin failed");

        if (*line == "start")
            return START;

        if (*line == "exit" || *line == "quit")
        {
            *value = 0;
            return EXIT;
        }

        if (line->find("stop") == 0)
        {
            istringstream istr(line->substr(4));
            istr >> *value;
            return !istr ? UNKNOWN : STOP;
        }

        istringstream istr(line->c_str());
        istr >> *value;
        if (istr)
        {
            getline(istr, *line);
            return TEXT;
        }

        return UNKNOWN;
    }

All other input sensed by d_select has been created by child processes. Because d_select's readFd() member returns the corresponding input file descriptor, this descriptor can be passed to processChild(). Then using a ifdstreambuf (see section 23.2.2.1), its information is read from an input stream. The communication protocol used here is rather basic: To every line of input sent to a child, the child sends exactly one line of text in return. Consequently, processChild() just has to read one line of text:

    void Monitor::processChild(int fd)
    {
        ifdstreambuf ifdbuf(fd);
        istream istr(&ifdbuf);
        string line;

        getline(istr, line);
        cout << d_child[fd]->pid() << ": " << line << endl;
    }

Please note the construction d_child[fd]->pid() used in the above source. Monitor defines the data member map<int, Child *> d_child. This map contains the child's order number as its key, and a pointer to the Child object as its value. A pointer is used here, rather than a Child object, since we do want to use the facilities offered by the map, but don't want to copy a Child object.

The implication of using pointers as map-values is of course that the responsibility to destruct the Child object once it becomes superfluous now lies with the programmer, and not any more with the run-time support system.

Now that run()'s implementation has been covered, we'll concentrate on the various commands users might enter:

Finally, the program's main() function is simply:
    #include "monitor.h"

    int main()
    try
    {
        Monitor monitor;

        monitor.run();
    }
    catch (int exitValue)
    {
        return exitValue;
    }

    /*
        Example of a session:

        # a.out
        ? start
        Child 1 started
        ? 1 hello world
        ? 3394: Child 1:1:  hello world
        ? 1 hi there!
        ? 3394: Child 1:2:  hi there!
        ? start
        Child 2 started
        ? 3394: Child 1: standing by
        ? 3395: Child 2: standing by
        ? 3394: Child 1: standing by
        ? 3395: Child 2: standing by
        ? stop 1
        ? 3395: Child 2: standing by
        ? 2 hello world
        ? 3395: Child 2:1:  hello world
        ? 1 hello world
        No child number 1
        ? exit3395: Child 2: standing by
        ?
        #
    */

23.4.5.3: The class `Child'

When the Monitor object starts a child process, it has to create an object of the class Child. The Child class is derived from the class Fork, allowing its construction as a daemon, as discussed in the previous section. Since a Child object is a daemon, we know that its parent process should be defined as an empty function. its childProcess() must of course still be defined. Here are the characteristics of the class Child:

23.5: Function objects performing bitwise operations

In section 18.1 several types of predefined function objects were introduced. Predefined function objects performing arithmetic operations, relational operations, and logical operations exist, corresponding to a multitude of binary- and unary operators.

Some operators appear to be missing: there appear to be no predefined function objects corresponding to bitwise operations. However, their construction is, given the available predefined function objects, not difficult. The following examples show a class template implementing a function object calling the bitwise and ( operator&()), and a template class implementing a function object calling the unary not ( operator~()). It is left to the reader to construct similar function objects for other operators.

Here is the implementation of a function object calling the bitwise operator&():

    #include <functional>

    template <typename _Tp>
    struct bit_and: public std::binary_function<_Tp, _Tp, _Tp>
    {
        _Tp operator()(_Tp const &__x, _Tp const &__y) const
        {
            return __x & __y;
        }
    };

Here is the implementation of a function object calling operator~():

    #include <functional>

    template <typename _Tp>
    struct bit_not: public std::unary_function<_Tp, _Tp>
    {
        _Tp operator()(_Tp const &__x) const
        {
            return ~__x;
        }
    };

These and other missing predefined function objects are also implemented in the file bitfunctional, which is found in the cplusplus.yo.zip archive. It should be noted that these classes are derived from existing class templates (e.g., std::binary_function and std::unary_function). These base classes offer several typedefs which are expected (used) by various generic algorithms as defined in the STL (cf. chapter 19), thus following the advice offered in, e.g., the C++ header file bits/stl_function.h:

   *  The standard functors are derived from structs named unary_function
   *  and binary_function.  These two classes contain nothing but typedefs,
   *  to aid in generic (template) programming.  If you write your own
   *  functors, you might consider doing the same.

Here is an example using bit_and() removing all odd numbers from a vector of int values:

    #include <iostream>
    #include <algorithm>
    #include <vector>
    #include "bitand.h"
    using namespace std;

    int main()
    {
        vector<int> vi;

        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        copy
        (
            vi.begin(),
            remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)),
            ostream_iterator<int>(cout, " ")
        );
        cout << endl;
    }
    /*
        Generated output:

        0 2 4 6 8
    */

23.6: Implementing a `reverse_iterator'

Earlier, in section 21.12.1, the construction of iterators and reverse iteraters was discussed. In that section the iterator was constructed as an inner class in a class derived from a vector of pointers to strings.

An object of this nested iterator class handled the dereferencing of the pointers stored in the vector. This allowed us to sort the strings pointed to by the vector's elements rather than the pointers.

A drawback of the approach taken in section 21.12.1 is that the class implementing the iterator is closely tied to the derived class as the iterator class was implemented as a nested class. What if we would like to provide any class derived from a container class storing pointers with an iterator handling the pointer-dereferencing?

In this section a variant to the earlier (nested class) approach is discussed. The iterator class will be defined as a class template, parameterizing the data type to which the container's elements point as well as the iterator type of the container itself. Once again, we will implement a RandomIterator as it is the most complex iterator type.

Our class is named RandomPtrIterator, indicating that it is a random iterator operating on pointer values. The class template defines three template type parameters:

RandomPtrIterator uses one private data element, a BaseIterator. Here is the class interface, including the constructor's implementation:
    #include <iterator>

    template <typename Class, typename BaseIterator, typename Type>
    class RandomPtrIterator:
          public std::iterator<std::random_access_iterator_tag, Type>
    {
        friend RandomPtrIterator<Class, BaseIterator, Type> Class::begin();
        friend RandomPtrIterator<Class, BaseIterator, Type> Class::end();

        BaseIterator d_current;

        RandomPtrIterator(BaseIterator const &current);

        public:
            bool operator!=(RandomPtrIterator const &other) const;
            int operator-(RandomPtrIterator const &rhs) const;
            RandomPtrIterator const operator+(int step) const;
            Type &operator*() const;
            bool operator<(RandomPtrIterator const &other) const;
            RandomPtrIterator &operator--();
            RandomPtrIterator const operator--(int);
            RandomPtrIterator &operator++();
            RandomPtrIterator const operator++(int);
            bool operator==(RandomPtrIterator const &other) const;
            RandomPtrIterator const operator-(int step) const;
            RandomPtrIterator &operator-=(int step);
            RandomPtrIterator &operator+=(int step);
            Type *operator->() const;
    };

    template <typename Class, typename BaseIterator, typename Type>
    RandomPtrIterator<Class, BaseIterator, Type>::RandomPtrIterator(
                                    BaseIterator const &current)
    :
        d_current(current)
    {}

Dissecting its friend declarations, we see that the members begin() and end() of a class Class, returning a RandomPtrIterator object for the types Class, BaseIterator and Type are granted access to RandomPtrIterator's private constructor. That is exactly what we want. Note that begin() and end() are declared as bound friends.

All RandomPtrIterator's remaining members are public. Since RandomPtrIterator is just a generalization of the nested class iterator developed in section 21.12.1, re-implementing the required member functions is easy, and only requires us to change iterator into RandomPtrIterator and to change std::string into Type. For example, operator<(), defined in the class iterator as

inline bool StringPtr::iterator::operator<(iterator const &other) const
{
    return **d_current < **other.d_current;
}

is re-implemented as:

    template <typename Class, typename BaseIterator, typename Type>
    bool RandomPtrIterator<Class, BaseIterator, Type>::operator<(
                                    RandomPtrIterator const &other) const
    {
        return **d_current < **other.d_current;
    }

As a second example: operator*(), defined in the class iterator as

inline std::string &StringPtr::iterator::operator*() const
{
    return **d_current;
}

is re-implemented as:

    template <typename Class, typename BaseIterator, typename Type>
    Type &RandomPtrIterator<Class, BaseIterator, Type>::operator*() const
    {
        return **d_current;
    }

The pre- and postfix increment operators are re-implemented as:

    template <typename Class, typename BaseIterator, typename Type>
    RandomPtrIterator<Class, BaseIterator, Type>
    &RandomPtrIterator<Class, BaseIterator, Type>::operator++()
    {
        ++d_current;
        return *this;
    }
    template <typename Class, typename BaseIterator, typename Type>
    RandomPtrIterator<Class, BaseIterator, Type> const
    RandomPtrIterator<Class, BaseIterator, Type>::operator++(int)
    {
        return RandomPtrIterator(d_current++);
    }

Remaining members can be implemented accordingly, their actual implementations are left as an exercise to the reader (or can be obtained from the cplusplus.yo.zip archive, of course).

Reimplementing the class StringPtr developed in section 21.12.1 is not difficult either. Apart from including the header file defining the class template RandomPtrIterator, it requires only a single modification as its iterator typedef must now be associated with a RandomPtrIterator. Here are the full class interface and inline member definitions:

    #ifndef INCLUDED_STRINGPTR_H_
    #define INCLUDED_STRINGPTR_H_

    #include <vector>
    #include <string>
    #include "iterator.h"

    class StringPtr: public std::vector<std::string *>
    {
        public:
            typedef RandomPtrIterator
                    <
                        StringPtr,
                        std::vector<std::string *>::iterator,
                        std::string
                    >
                        iterator;

            typedef std::reverse_iterator<iterator> reverse_iterator;

            iterator begin();
            iterator end();
            reverse_iterator rbegin();
            reverse_iterator rend();
    };

    inline StringPtr::iterator StringPtr::begin()
    {
        return iterator(this->std::vector<std::string *>::begin() );
    }
    inline StringPtr::iterator StringPtr::end()
    {
        return iterator(this->std::vector<std::string *>::end());
    }
    inline StringPtr::reverse_iterator StringPtr::rbegin()
    {
        return reverse_iterator(end());
    }
    inline StringPtr::reverse_iterator StringPtr::rend()
    {
        return reverse_iterator(begin());
    }
    #endif

Including StringPtr's modified header file into the program given in section 21.12.2 will result in a program behaving identically to its earlier version, albeit that StringPtr::begin() and StringPtr::end() now return iterator objects constructed from a template definition.

23.7: A text to anything converter

The standard C library offers conversion functions like atoi(), atol(), and other functions, which can be used to convert ASCII-Z strings to numeric values. In C++, these functions are still available, but a more type safe way to convert text to other types is by using objects of the class std::istringsteam.

Using the std::istringstream class instead of the C standard conversion functions may have the advantage of type-safety, but it also appears to be a rather cumbersome alternative. After all, we will have to construct and initialize a std::istringstream object first, before we're actually able to extract a value of some type from it. This requires us to use a variable. Then, if the extracted value is actually only needed to initialize some function-parameter, one might wonder whether the additional variable and the istringstream construction can somehow be avoided.

In this section we'll develop a class ( A2x) preventing all the disadvantages of the standard C library functions, without requiring the cumbersome definitions of std::istringstream objects over and over again. The class is called A2x for ` ascii to anything'.

A2x objects can be used to obtain a value for any type extractable from std::istream objects given its textual representation. Since A2x represents the object-variant of the C functions, it is not only type-safe but also extensible. Consequently, their use is greatly preferred over the standard C functions. Here are its characteristics:

Here are some examples showing its use:

    int x = A2x("12");          // initialize int x from a string "12"
    A2x a2x("12.50");           // explicitly create an A2x object

    double d;
    d = a2x;                    // assign a variable using an A2x object
    cout << d << endl;

    a2x = "err";
    d = a2x;                    // d is 0: the conversion failed,
    cout << d << endl;          // and a2x.good() == false

    a2x = " a";                 // reassign a2x to new text
    char c = a2x;               // c now 'a': internally operator>>() is used
    cout << c << endl;          // so initial blanks are skipped.

    int expectsInt(int x);      // initialize a parameter using an
    expectsInt(A2x("1200"));    // anonymous A2x object

    d = A2x("12.45").to<int>(); // d is 12, not 12.45
    cout << d << endl;

A complementary class ( X2a), converting values to text, can easily be constructed as well. The construction of X2a is left as an exercise to the reader.

23.8: Wrappers for STL algorithms

Many generic algorithms (cf. chapter 19) use function objects to operate on the data to which their iterators refer, or they require predicate function objects using some criterion to make a decision about these data. The standard approach followed by the generic algorithms is to pass the information to which the iterators refer to overloaded function call operators (i.e., operator()()) of function objects that are passed as arguments to the generic algorithms.

Usually this approach requires the construction of a dedicated class implementing the required function object. However, in many cases the class context in which the iterators exist already offers the required functionality. Alternatively, the functionality might exist as member function of the objects to which the iterators refer. For example, finding the first empty string object in a vector of string objects could profitably use the string::empty() member.

Another frequently encountered situation is related to a local context. Once again, consider the situation where the elements of a string vector are all visited: each object must be inserted in a stream whose reference is only known to the function in which the string elements are visited, but some additional information must be passed to the insertion function as well, making the use of the ostream_inserter less appropriate.

The frustrating part of using generic algorithms is that these dedicated function objects often very much look like each other, but the standard solution (using predefined function objects using specialized iterators) seldom do the required job: their fixed function interfaces (e.g., equal_to calling the object's operator==()) often are too rigid to be useful and, furthermore, they are unable to use any additional local context that is active when they are used.

One may wonder whether class templates might be constructed which can be used again and again to create dedicated function objects. Such class template instantiations should offer facilities to call configurable (member) functions using a configurable local context.

In the upcoming sections, several wrapper templates supporting these requirements are developed. To support a local context, a dedicated local context struct is introduced. Furthermore, the wrapper templates will allow us to specify at construction time the member function that should be called. Thus the rigidness of the fixed member function as used in the predefined function objects is avoided.

As an example of a generic algorithm usually requiring a simple function object, consider for_each(). The operator()() of the function object passed to this algorithm receives as its argument a reference to the object to which the iterators refer. Generally, operator()() will do one of two things:

Of course, the latter example is a bit overkill, since somefunction()'s address could actually directly have been passed to the generic algorithm, so why use this complex procedure? The answer is context: if somefunction() would actually require other arguments, representing the local context in which somefunction() was called, then the function object's constructor could have received the local context as its arguments, passing that local context on to somefunction(), together with the object received by the function object's operator()() function. There is no way to pass any local context to the generic algorithm's simple variant, in which a function's address is passed to the generic function.

At first sight, however, the fact that a local context differs from one situation to another makes it hard to standardize the local context: a local context might consist of values, pointers, references, which differ in number and types from one situation to another. Defining templates for all possible situations is clearly impractical, and using C-style variadic functions is also not very attractive, since the arguments passed to a variadic function object constructor cannot simply be passed on to the function object's operator()().

The concept of a local context struct is introduced to standardize the local context. It is based on the following considerations:

Of course, the organization of local contexts will differ from one situation to the next situation, but there is always just one local context required. The fact that the inner organization of the local context differs from one situation to the next causes no difficulty at all to C++'s template mechanism. Actually, having available a generic type (Context) together with several concrete instantiations of that generic type is a text-book argument for using templates.

23.8.1: Local context structs

When a function is called, the context in which it is called is made known to the function by providing the function with a parameter list. When the function is called, these parameters are initialized by the function's arguments. For example, a function show() may expect two arguments: an ostream into which the information is inserted and an object which will be inserted into the stream. For example:
    void State::show(ostream &out, Item const &item)
    {
        out << "Here is item " << item.nr() << ":\n" <<
                item << endl;
    }
Of course, functions differ greatly in their parameter lists: both the numbers and types of their parameters vary.

A local context struct is used to standardize the parameter lists of functions, for the benefit of template construction. In the above example, the function State::show() uses a local context consisting of an ostream & and an Item const &. This context never changes, and may be offered through a struct defined as follows:

    struct ShowContext
    {
        ostream &out;
        Item const &item;
    };
Note how this struct mimics State::show()'s parameter list. Since it is directly connected to the function State::show() it is best defined in the class State. Once we have defined this struct, State::show()'s implementation is modified. It now expects a ShowContext &:
    void State::show(ShowContext &context)
    {
        context.out << "Here is item " << context.item.nr() << ":\n" <<
                context.item << endl;
    }
Using a local context struct any parameter list (except those of variadic functions) can be standardized to a parameter list consisting of a single element. Now that we have a single parameter to specify any local context we're ready for the `templatization' of function object wrapper classes.

23.8.2: Member functions called from function objects

The function operator member operator()() is characteristic of function objects. It may be defined as a function having various parameters. In the context of generic algorithms, they usually have one or two parameters referring to the data to be processed by the algorithm.

The class template constructor should be aware of the fact that it is not known beforehand whether these parameters are objects, primitive types, pointers or references. Knowing this, however, is important when the function object instantiated from the template class is to be used by generic algorithms, since many generic algorithms require that various types are defined by the function object. For example, generic algorithms like for_each, calling unary argument function objects, expect that these function objects define the types argument_type and result_type, referring to the plain types of, respectively, the function operator's argument and return value. Analogously, generic algorithms like includes (cf. section 19.1.1) expect that function objects define the types first_argument_type, second_argument_type and result_type.

To determine the plain type of a template type parameter a trait class can be used. In section section 22.4 the trait class TypeTrait was intrioduced, allowing templates to determine what various characteristics are of template type parameters. This TypeTrait class can profitably be used here to determine the proper definnitions of types required by generic algorithms.

Let's assume that we would like to create a function object changing all letters in string objects into capital letters. Clearly we will have to access the string's individual characters. However, the strings themselves may be made available through references (e.g., when iterating over the elements of a vector<string>), but also through pointers (e.g., when iterating over the elements of a vector<string *>).

The template class providing the function object should not be responsible for the actions to be performed. Rather, executing the required actions should be deferred to a function, which can be specified when the template is instantiated. If that function is a member function it should, for various reasons, be a static member function:

Generic algorithms also differ in the way they use the function object's return value. This turns out to be no problem: templates allow us to parameterize the return types of functions.

23.8.3: The unary argument context sensitive Function Object template

As an opening example, let's assume we have a class Strings holding a vector<string> d_vs data member. We would like to change all occurrences of a specified set of letters found in the strings stored in d_vs into uppercase characters, and we would like to insert the original and modified strings into a configurable ostream object. To accomplish this, our class offers a member uppercase(ostream &out, char const *letters).

We would like to use the for_each() generic algorithm. This algorithm may be given a function object. The function object will be initialized with a local context consisting of the ostream object and the set of letters to be used. The following support class is constructed:

    class Support
    {
        std::ostream &d_out;
        std::string d_letters;

        public:
            Support(std::ostream &out, char const *letters);
            void operator()(std::string &str) const;
    };

    inline Support::Support(std::ostream &out, char const *letters)
    :
        d_out(out),
        d_letters(letters)
    {}

    inline void Support::operator()(std::string &str) const
    {
        d_out << str << " ";

        for
        (
            std::string::iterator strIter = str.begin();
                strIter != str.end();
                    ++strIter
        )
        {
            if (d_letters.find(*strIter) != std::string::npos)
                *strIter = toupper(*strIter);
        }
        d_out << str << std::endl;
    }
Note that the implementation of operator()() contains another for statement, which should be replaced by another for_each call. This suggests that either another function object must be constructed or that an overloaded version of operator()() must be defined. Both alternatives are not very attractive: constructing large numbers of small function object classes soon becomes a nuisance and there's a limit imposed by the parameter types to the number of overloaded operator()() members that can be defined, let alone that the self-documenting value of the purpose of `operator()()' is very limited.

For now an anonymous Support class object is used in the implementation of the class Strings. Here is an example of its definition and use:

    #include <iostream>
    #include <string>
    #include <vector>
    #include <algorithm>

    #include "support.h"

    class Strings
    {
        std::vector<std::string> d_vs;

        public:
            void uppercase(std::ostream &out, char const *letters);
    };

    void Strings::uppercase(std::ostream &out, char const *letters)
    {
        for_each(d_vs.begin(), d_vs.end(), Support(out, letters));
    }

    using namespace std;

    int main(int argc, char **argv)
    {
        Strings s;

        s.uppercase(cout, argv[1]);
    }

To `templatize' the Support class using the considerations discussed previously, we perform the following steps:

Now we're ready to construct the full class template FnWrap1c using the TypeTraits class to determine the argument_type and result_type:
#include "typetrait.h"

template <typename Type, typename Context, typename ReturnType = void>
class FnWrap1c
{
    ReturnType (*d_fun)(Type, Context);
    Context d_context;

    public:
        typedef typename TypeTrait<Type>::Plain         argument_type;
        typedef typename TypeTrait<ReturnType>::Plain   result_type;

        FnWrap1c(ReturnType fun(Type, Context), Context context);
        ReturnType operator()(Type param) const;
};

template <typename Type, typename Context, typename ReturnType>
FnWrap1c<Type, Context, ReturnType>::FnWrap1c(
                ReturnType fun(Type, Context), Context context)
:
    d_fun(fun),
    d_context(context)
{}

template <typename Type, typename Context, typename ReturnType>
inline ReturnType FnWrap1c<Type, Context, ReturnType>::operator()(Type param)
                                                                  const
{
    return (*d_fun)(param, d_context);
}

Now the template can be used. The class Support is abandoned. Instead of using a separate class, all members required to transform the strings of Strings objects will be defined as static members inside Strings itself. This neatly localizes the actions where they belong: inside the class that wants to transform its own data. So, the original dedicated implementation of Support::operator()() is now defined as a static member xform inside the class Strings. The class also defines a context struct Context, containing a reference to the ostream to use and a set of characters to capitalize. The public function uppercase now simply initializes the context struct, and creates an FnWrap1c object, calling xform, which is passed to the for_each call. Note that FnWrap1c's template type arguments exactly mirror xform's parameter types. This is a general characteristic of the function object wrappers that are introduced in this and the coming section.

It is the purpose of xform to transform the characters of an individual string. What better procedure than to call for_each again, this time using the string's begin() and end() members to define another iterator range. In this case the letter set must be known, and so it is passed as the local context to FnWrap1c defined inside xform. This latter FnWrap1c object calls the static function foundToUpper to capitalize individual characters of the strings if necessary. Here is the new implementation of the class Strings, now using FnWrap1c:

    #include <iostream>
    #include <string>
    #include <vector>
    #include <algorithm>

    #include "fnwrap1.h"

    class Strings
    {
        std::vector<std::string> d_vs;

        struct Context
        {
            std::ostream &out;
            std::string letters;
        };

        public:
            void uppercase(std::ostream &out, char const *letters);

        private:
            static void xform(std::string &str, Context &context);
            static void foundToUpper(char &ch, std::string const &letters);
    };

    void Strings::uppercase(std::ostream &out, char const *letters)
    {
        Context context = {out, letters};

        for_each(d_vs.begin(), d_vs.end(),
            FnWrap1c<std::string &, Context &>(xform, context));
    }

    void Strings::xform(std::string &str, Context &context)
    {
        context.out << str << " ";

        for_each(str.begin(), str.end(),
            FnWrap1c<char &, std::string const &>(
                                        foundToUpper, context.letters));

        context.out << str << std::endl;
    }

    void Strings::foundToUpper(char &ch, std::string const &letters)
    {
        if (letters.find(ch) != std::string::npos)
            ch = toupper(ch);
    }

    using namespace std;

    int main(int argc, char **argv)
    {
        Strings s;

        s.uppercase(cout, argv[1]);
    }

To illustrate the use of the ReturnType template parameter, let's assume that the transformations are only required up to the first empty string. In this case, the find_if generic algorithm comes in handy, since it stops once a predicate returns true. In that case the xform() function should return a bool value, and the uppercase() implementation specifies an explicit type (bool) for the ReturnType template parameter. The next implementation of the class Strings merely shows the required modifications, the remainder is not altered:

    class Strings
    {
        private:
            static bool xform(std::string &str, Context &context);
    };

    void Strings::uppercase(std::ostream &out, char const *letters)
    {
        Context context = {out, letters};

        find_if(d_vs.begin(), d_vs.end(),
            FnWrap1c<std::string &, Context &, bool>(xform, context));
    }

    bool Strings::xform(std::string &str, Context &context)
    {
        if (str.empty())
            return true;

        // previous implementation (not modified)

        return false;
    }

23.8.4: The binary argument context sensitive Function Object template

Having constructed the unary template wrapper, the construction of the binary template wrapper should offer no surprises. The function object's operator()() is now called with two, rather than one argument. Coining the classname FnWrap2c, it's implementation is almost identical to FnWrap1c's implementation, and it can be found in the Bobcat library. . An example showing its use is:
    // accumulating strings from a vector to one big string, using
    // `accumulate'.
    #include <iostream>
    #include <numeric>
    #include <string>
    #include <vector>
    #include <bobcat/fnwrap2c>

    using namespace std;
    using namespace FBB;

    class Strings
    {
        vector<string> d_vs;

        public:
            Strings()
            {
                d_vs.push_back("one");
                d_vs.push_back("two");
                d_vs.push_back("three");
            }

            void display(ostream &out) const
            {
                SContext c = {1, out};

                cout << "On Exit: " <<
                    accumulate(
                        d_vs.begin(), d_vs.end(),
                        string("HI"),
                        FnWrap2c<string const &, string const &,
                                 SContext &, string>(&show, c)
                    ) <<
                    endl;
            }

        private:
            struct SContext
            {
                size_t nr;
                ostream &out;
            };

            static string show(string const &str1,
                                    string const &str2,
                                    SContext &c)
            {
                c.out << c.nr++ << " " << str1 << " " << str2 <<
                         endl;
                return str1 + " " + str2;
            }
    };

    int main()
    {
        Strings s;
        s.display(cout);
    }
As with the unary template wrapper (see section 23.8.3), an additional class may be defined that does not require a local context. After compilation and linking, just run the program without any arguments. It requires Bobcat's fnwrap2c and typetrait header files.

23.9: Using `bisonc++' and `flex'

The example discussed in this section digs into the peculiarities of using a parser- and scanner generator generating C++ sources. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator to create the code which does the actual input recognition.

The current example assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

However, scanner- and parser generators are also (and maybe even more commonly, nowadays) available as free software. Both bison and flex are usually part of software distributions or they can be obtained from ftp://prep.ai.mit.edu/pub/non-gnu. Flex creates a C++ class when %option c++ is specified.

For parser generators the program bison is available. Back in the early 90's Alain Coetmeur (coetmeur@icdc.fr) created a C++ variant ( bison++) creating a parser class. Although bison++ program produces code that can be used in C++ programs it also shows many characteristics that are more appropriate in a C context than in a C++ context. In January 2005 I rewrote parts of Alain's bison++ program, resulting in the original version of the program bisonc++. Then, in May 2005 a complete rewrite of the bisonc++ parser generator was completed, which is available on the Internet having version numbers 0.98 and beyond. Bisonc++ can be downloaded from http://bisoncpp.sourceforge.net/, where it is available as source archive and as binary (i386) Debian binary package (including bisonc++'s documentation). Bisonc++ creates a cleaner parser class setup than bison++. In particular, it derives the parser class from a base-class, containing the parser's token- and type-definitions as well as all member functions which should not be (re)defined by the programmer. Most of these members might also be defined directly in the parser class. Because of this approach, the resulting parser class is very small, declaring only members that are actually defined by the programmer (as well as some other members, generated by bisonc++ itself, implementing the parser's parse() member). Actually, parse() is initially the only public member of bisonc++'s generated parser class. Remaining members are private. The only member which is not implemented by default is lex(), producing the next lexical token. When the directive %scanner (see section 23.9.2.1) is used, bisonc++ will generate a standard implementation for this member; otherwise it must be implemented by the programmer.

In this section of the Annotations we will focus on bisonc++ as our parser generator.

Using flex and bisonc++ class-based scanners and parsers can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become cleaner than without using the class interface. Furthermore, classes allow us to get rid of most if not all global variables, making it easy to use multiple parsers in one program.

Below two examples are elaborated. The first example only uses flex. The scanner it generates monitors the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the information. The second example uses both flex and bisonc++ to generate a scanner and a parser transforming standard arithmetic expressions to their postfix notations, commonly used in code generated by compilers and in HP-calculators. In the second example the emphasis is mainly on bisonc++ and on composing a scanner object inside a generated parser.

23.9.1: Using `flex' to create a scanner

The lexical scanner developed in this section is used to monitor the production of a file from several subfiles. The setup is as follows: the input-language knows of an #include directive, followed by a text string specifying the file (path) which should be included at the location of the #include.

In order to avoid complexities irrelevant to the current example, the format of the #include statement is restricted to the form #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program terminates after issuing an error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program defines a maximum nesting depth. Once this maximum is exceeded, the program terminates after issuing an error message. In that case, the filename stack indicating where which file was included is printed.

One additional feature is that (standard C++) comment-lines are ignored. So, include directives in comment-lines are ignored too.

The program is created along the following steps:

23.9.1.1: The derived class `Scanner'

The code associated with the regular expression rules is located inside the class yyFlexLexer. However, we would of course want to use the derived class's members in this code. This causes a little problem: how does a base-class member know about members of classes derived from it?

Fortunately, inheritance helps us to implement this. In the specification of the class yyFlexLexer(), we notice that the function yylex() is a virtual function. The header file FlexLexer.h declares the virtual member int yylex():

    class yyFlexLexer: public FlexLexer
    {
        public:
            yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );

            virtual ~yyFlexLexer();

            void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
            struct yy_buffer_state* yy_create_buffer( istream* s, int size );
            void yy_delete_buffer( struct yy_buffer_state* b );
            void yyrestart( istream* s );

            virtual int yylex();

            virtual void switch_streams( istream* new_in, ostream* new_out );
    };
As this function is a virtual function it can be overridden in a derived class. In that case the overridden function will be called from its base class (i.e., yyFlexLexer) code. Since the derived class's yylex() is called, it will now have access to the members of the derived class, and also to the public and protected members of its base class.

By default, the context in which the generated scanner is placed is the function yyFlexLexer::yylex(). This context changes if we use a derived class, e.g., Scanner. To derive Scanner from yyFlexLexer, generated by flex, do as follows:

Looking at the regular expressions themselves, notice that we need rules to recognize comment, #include directives, and all remaining characters. This is all fairly standard practice. When an #include directive is detected, the directive is parsed by the scanner. This too is common practice. Here is what our lexical scanner will do:

The lexical scanner specification file is organized similarly as the one used for flex in C contexts. However, for C++ contexts, flex may create a class ( yyFlexLexer) from which another class (e.g., Scanner) can be derived. The flex specification file itself has three sections:

Since the derived class's members may now access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of yyFlexLexer are protected, and thus accessible to derived classes), most processing can be left to the derived class's member functions. This results in a very clean setup of the lexer specification file, requiring no or hardly any code in the preamble.

23.9.1.2: Implementing `Scanner'

The class Scanner, derived as usual from the class yyFlexLexer, is generated by flex(1) . The derived class has access to data controlled by the lexical scanner. In particular, it has access to the following data members: Other members are available as well, but are used less often. Details can be found in FlexLexer.h.

Objects of the class Scanner perform two tasks:

Several member functions are used to accomplish these tasks. As they are auxiliary to the scanner, they are private members. In practice, develop these private members once the need for them arises. Note that, apart from the private member functions, several private data members are defined as well. Let's have a closer look at the implementation of the class Scanner:

23.9.1.3: Using a `Scanner' object

The program using our Scanner is very simple. It expects a filename indicating where to start the scanning process. Initially the number of arguments is checked. If at least one argument was given, then an ifstream object is created. If this object can be created, then a Scanner object is constructed, receiving the address of the ifstream object and the name of the initial input file as its arguments. Then the Scanner object's yylex() member is called. The scanner object throws Scanner::Error exceptions if it fails to perform its tasks properly. These exceptions are caught near main()'s end. Here is the program's source:
    #include "lexer.h"
    using namespace std;

    int main(int argc, char **argv)
    {
        if (argc == 1)
        {
            cerr << "Filename argument required\n";
            return 1;
        }

        ifstream yyin(argv[1]);
        if (!yyin)
        {
            cerr << "Can't read " << argv[1] << endl;
            return 1;
        }

        Scanner scanner(&yyin, argv[1]);
        try
        {
            return scanner.yylex();
        }
        catch (Scanner::Error err)
        {
            char const *msg[] =
            {
                "Include specification",
                "Circular Include",
                "Nesting",
                "Read",
            };
            cerr << msg[err] << " error in " << scanner.lastFile() <<
                                ", line " << scanner.lineno() << endl;
            scanner.stackTrace();
            return 1;
        }
        return 0;
    }

23.9.1.4: Building the program

The final program is constructed in two steps. These steps are given for a Unix system, on which flex and the Gnu C++ compiler g++ have been installed: For the purpose of debugging a lexical scanner, the matched rules and the returned tokens provide useful information. When the %option debug was specified, debugging code will be included in the generated scanner. To obtain debugging info, this code must also be activated. Assuming the scanner object is called scanner, the statement
        scanner.set_debug(true);
will produce debugging info to the standard error stream.

23.9.2: Using both `bisonc++' and `flex'

When an input language exceeds a certain level of complexity, a parser is often used to control the complexity of the input language. In this case, a parser generator can be used to generate the code verifying the input's grammatical correctness. The lexical scanner (preferably composed into the parser) provides chunks of the input, called tokens. The parser then processes the series of tokens generated by its lexical scanner.

Starting point when developing programs that use both parsers and scanners is the grammar. The grammar defines a set of tokens which can be returned by the lexical scanner (commonly called the lexer).

Finally, auxiliary code is provided to `fill in the blanks': the actions performed by the parser and by the lexer are not normally specified literally in the grammatical rules or lexical regular expressions, but should be implemented in member functions, called from within the parser's rules or which are associated with the lexer's regular expressions.

In the previous section we've seen an example of a C++ class generated by flex. In the current section we concentrate on the parser. The parser can be generated from a grammar specification, processed by the program bisonc++. The grammar specification required for bisonc++ is similar to the specifications required for bison (and an existing program bison++, written in the early nineties by the Frenchman Alain Coetmeur), but bisonc++ generates a C++ which more closely follows present-day standards than bison++, which still shows many C-like features.

In this section a program is developed converting infix expressions, in which binary operators are written between their operands, to postfix expressions, in which binary operators are written behind their operands. Furthermore, the unary operator - will be converted from its prefix notation to a postfix form. The unary + operator is ignored as it requires no further actions. In essence our little calculator is a micro compiler, transforming numeric expressions into assembly-like instructions.

Our calculator will recognize a very basic set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in bison-like grammar specifications. That's all. The purpose of this section is, after all, to illustrate the construction of a C++ program that uses both a parser and a lexical scanner, rather than to construct a full-fledged calculator.

In the coming sections we'll develop the grammar specification for bisonc++. Then, the regular expressions for the scanner are specified according to flex' requirements. Finally the program is constructed.

23.9.2.1: The `bisonc++' specification file

The grammar specification file required by bisonc++ is comparable to the specification file required by bison. Differences are related to the class nature of the resulting parser. Our calculator will distinguish real numbers from integers, and will support a basic set of arithmetic operators.

Bisonc++ should be used as follows:

The bisonc++ specification file consists of two sections:

Readers familiar with bison should note that there is no header section anymore. Header sections are used by bison to provide for the necessary declarations allowing the compiler to compile the C function generated by bison. In C++ declarations are part of or already used by class definitions. Therefore, a parser generator generating a C++ class and some of its member functions does not require a header section anymore.

The declaration section

The declaration section contains several declarations, among which all tokens used in the grammar and the priority rules of the mathematical operators. Moreover, several new and important specifications can be used here. Those that are relevant to our current example and only available in bisonc++ are discussed here. The readery is referred to bisonc++'s man-page for a full description. An example of a %union declaration is:
    %union
    {
        int     i;
        double  d;
    };
A union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a string cannot be a member of the union. A string *, however, is a possible union member. By the way: the lexical scanner does not have to know about such a union. The scanner can simply pass its scanned text to the parser through its YYText() member function. For example using a statement like
    $$.i = A2x(scanner.YYText());
matched text may be converted to a value of an appropriate type.

Tokens and non-terminals can be associated with union fields. This is strongly advised, as it prevents type mismatches, since the compiler will be able to check for type correctness. At the same time, the bison specific variabels $$, $1, $2, etc. may be used, rather than the full field specification (like $$.i). A non-terminal or a token may be associated with a union field using the <fieldname> specification. E.g.,

    %token <i> INT          // token association (deprecated, see below)
           <d> DOUBLE
    %type  <i> intExpr      // non-terminal association
In the example developed here, note that both the tokens and the non-terminals can be associated with a field of the union. However, as noted before, the lexical scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser, knowing what the input is all about, may then convert strings like "123" to an integer value. Consequently, the association of a union field and a token is discouraged. In the upcoming description of the rules of the grammar this will be illustrated further.

In the %union discussion the %token and %type specifications should be noted. They are used to specify the tokens ( terminal symbols) that can be returned by the lexical scanner, and to specify the return types of non-terminals. Apart from %token the token indicators %left, %right and %nonassoc may be used to specify the associativity of operators. The tokens mentioned at these indicators are interpreted as tokens indicating operators, associating in the indicated direction. The precedence of operators is given by their order: the first specification has the lowest priority. To overrule a certain precedence in a certain context, %prec can be used. As all this is standard bisonc++ practice, it isn't further elaborated here. The documentation provided with bisonc++'s distribution should be consulted for further reference.

Here is the specification of the calculator's declaration section:

%filenames parser
%scanner ../scanner/scanner.h
%lines

%union {
    int i;
    double d;
};

%token  INT
        DOUBLE

%type   <i> intExpr
%type   <d> doubleExpr

%left   '+'
%left   '*'
%right  UnaryMinus

In the declaration section %type specifiers are used, associating the intExpr rule's value (see the next section) to the i-field of the semantic-value union, and associating doubleExpr's value to the d-field. At first sight this may look complex, since the expression rules must be included for each individual return type. On the other hand, if the union itself would have been used, we would still have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

The grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. There are quite a few rules, but they illustrate various features offered by bisonc++. In particular, note that no action block requires more than a single line of code. This keeps the organization of the grammar relatively simple, and therefore enhances its readability and understandability. Even the rule defining the parser's proper termination (the empty line in the line rule) uses a single member function call done(). The implementation of that function is simple, but interesting in that it calls Parser::ACCEPT(), showing that the ACCEPT() member can be called indirectly from a production rule's action block. Here are the grammar's production rules:
    lines:
        lines
        line
    |
        line
    ;

    line:
        intExpr
        '\n'
        {
            display($1);
        }
    |
        doubleExpr
        '\n'
        {
            display($1);
        }
    |
        '\n'
        {
            done();
        }
    |
        error
        '\n'
        {
            reset();
        }
    ;

    intExpr:
        intExpr '*' intExpr
        {
            $$ = exec('*', $1, $3);
        }
    |
        intExpr '+' intExpr
        {
            $$ = exec('+', $1, $3);
        }
    |
        '(' intExpr ')'
        {

            $$ = $2;
        }
    |
        '-' intExpr         %prec UnaryMinus
        {
            $$ = neg($2);
        }
    |
        INT
        {
            $$ = convert<int>();
        }
    ;

    doubleExpr:
        doubleExpr '*' doubleExpr
        {
            $$ = exec('*', $1, $3);
        }
    |
        doubleExpr '*' intExpr
        {
            $$ = exec('*', $1, d($3));
        }
    |
        intExpr '*' doubleExpr
        {
            $$ = exec('*', d($1), $3);
        }
    |
        doubleExpr '+' doubleExpr
        {
            $$ = exec('+', $1, $3);
        }
    |
        doubleExpr '+' intExpr
        {
            $$ = exec('+', $1, d($3));
        }
    |
        intExpr '+' doubleExpr
        {
            $$ = exec('+', d($1), $3);
        }
    |
        '(' doubleExpr ')'
        {
            $$ = $2;
        }
    |
        '-' doubleExpr         %prec UnaryMinus
        {

            $$ = neg($2);
        }
    |
        DOUBLE
        {
            $$ = convert<double>();
        }
    ;

The above grammar is used to implement a simple calculator in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The grammar shows the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the parser's definition section.

The Parser's header file

Various functions called from the grammar are defined as template functions. Bisonc++ generates various files, among which the file defining the parser's class. Functions called from the production rule's action blocks are usually member functions of the parser, and these member functions must be declared and defined. Once bisonc++ has generated the header file defining the parser's class it will not automatically rewrite that file, allowing the programmer to add new members to the parser class. Here is the parser.h file as used for our little calculator:
#ifndef Parser_h_included
#define Parser_h_included

#include <iostream>
#include <sstream>
#include <bobcat/a2x>

#include "parserbase.h"
#include "../scanner/scanner.h"


#undef Parser
class Parser: public ParserBase
{
    std::ostringstream d_rpn;
    // $insert scannerobject
    Scanner d_scanner;

    public:
        int parse();

    private:
        template <typename Type>
            Type exec(char c, Type left, Type right);
        template <typename Type>
            Type neg(Type op);
        template <typename Type>
            Type convert();

        void display(int x);
        void display(double x);
        void done() const;
        void reset();
        void error(char const *msg);
        int lex();
        void print();

        static double d(int i);

    // support functions for parse():

        void executeAction(int d_ruleNr);
        void errorRecovery();
        int lookup(bool recovery);
        void nextToken();
};


inline double Parser::d(int i)
{
    return i;
}

template <typename Type>
Type Parser::exec(char c, Type left, Type right)
{
    d_rpn << " " << c << " ";
    return c == '*' ? left * right : left + right;
}

template <typename Type>
Type Parser::neg(Type op)
{
    d_rpn << " n ";
    return -op;
}

template <typename Type>
Type Parser::convert()
{
    Type ret = FBB::A2x(d_scanner.YYText());
    d_rpn << " " << ret << " ";
    return ret;
}

inline void Parser::error(char const *msg)
{
    std::cerr << msg << std::endl;
}

inline int Parser::lex()
{
    return d_scanner.yylex();
}

inline void Parser::print()
{}

#endif

23.9.2.2: The `flex' specification file

The flex-specification file used by our calculator is simple: blanks are skipped, single characters are returned, and numeric values are returned as either Parser::INT or Parser::DOUBLE tokens. Here is the complete flex specification file:
%{
    #define _SKIP_YYFLEXLEXER_
    #include "scanner.ih"

    #include "../parser/parserbase.h"
%}

%option yyclass="Scanner" outfile="yylex.cc" c++ 8bit warn noyywrap
%option debug

%%

[ \t]                       ;
[0-9]+                      return Parser::INT;

"."[0-9]*                   |
[0-9]+("."[0-9]*)?          return Parser::DOUBLE;

.|\n                        return *yytext;

%%

23.9.2.3: Generating code

The code is generated in the same way as with bison and flex. In order to have bisonc++ generate the files parser.cc and parser.h, issue the command:
    bisonc++ -V grammar
The option -V will generate the file parser.output showing information about the internal structure of the provided grammar, among which its states. It is useful for debugging purposes, and can be left out of the command if no debugging is required. Bisonc++ may detect conflicts ( shift-reduce conflicts and/or reduce-reduce conflicts) in the provided grammar. These conflicts may be resolved explicitly using disambiguation rules or they are `resolved' by default. A shift-reduce conflict is resolved by shifting, i.e., the next token is consumed. A reduce-reduce conflict is resolved by using the first of two competing production rules. Bisonc++ uses identical conflict resolution procedures as bison and bison++.

Once a parser class and parsing member function has been constructed flex may be used to create a lexical scanner (in, e.g., the file yylex.cc) using the command

    flex -I lexer

On Unix systems a command comparable to:

    g++ -o calc -Wall *.cc -s
can be used to compile and link both the source of the main program and the generated sources. Finally, here is a source file in which the main() function and the parser object is defined. The parser features the lexical scanner as one of its data members:
#include "parser/parser.h"

using namespace std;

int main()
{
    Parser parser;

    cout << "Enter (nested) expressions containing ints, doubles, *, + and "
            "unary -\n"
            "operators. Enter an empty line to stop.\n";

    return parser.parse();
}

Bisonc++ can be downloaded from http://bisoncpp.sourceforge.net/. It requires the bobcat library, which can be downloaded from http://bobcat.sourceforge.net/.

23.9.3: Using polymorphic semantic values with Bisonc++

Below the way Bisonc++ may use a polymorphic semantic value is discussed. The approach discussed below is a direct result of a suggestion initially brought forward by Dallas A. Clement in September 2007.

One may wonder why a union is still used by Bisonc++ as C++ offers inherently superior approaches to combine multiple types into one type. The C++ way to combine types into one type is by defining a polymorphic base class and a series of derived classes implementing the alternative data types. Bisonc++ supports The union approach for backward compatibility reasons: both bison(1) and bison++ support the %union directive.

The alternative to using a union is using a polymorphic base class. This class will be developed below as the class Base. Since it's a polymorphic base class it should have the following characteristics:

The class Base has the following interface:

    #ifndef INCLUDED_BASE_
    #define INCLUDED_BASE_

    #include <iosfwd>

    class Base
    {
        public:
            virtual ~Base();
            virtual Base *clone() const = 0;
            virtual std::ostream &insert(std::ostream &os) const = 0;
    };

    inline Base::~Base()
    {}

    inline std::ostream &operator<<(std::ostream &out, Base const &obj)
    {
        return obj.insert(out);
    }

    #endif

The alternatives as defined by a classical union are now defined by classes derived from the class Base. For example:

The polymorphic Base, however, can't be used as the parser's semantic value type:

To solve the above problems, a wrapper class Semantic around a Base pointer is used. The wrapper class will take care of the proper destruction of objects accessed via its Base pointer data member and it will offer facilities to access the proper derived class. The interface of the class Semantic is:

#ifndef INCLUDED_SEMANTIC_
#define INCLUDED_SEMANTIC_

#include <ostream>

#include "../base/base.h"

class Semantic
{
    Base *d_bp;

    public:
        Semantic(Base *bp = 0);             // Semantic will own the bp
        Semantic(Semantic const &other);
        ~Semantic();

        Semantic &operator=(Semantic const &other);

        Base const &base() const;

        template <typename Class>
        Class const &downcast();

    private:
        void copy(Semantic const &other);
};

inline Base const &Semantic::base() const
{
    return *d_bp;
}

inline Semantic::Semantic(Base *bp)
:
    d_bp(bp)
{}

inline Semantic::~Semantic()
{
    delete d_bp;
}

inline Semantic::Semantic(Semantic const &other)
{
    copy(other);
}

inline Semantic &Semantic::operator=(Semantic const &other)
{
    if (this != &other)
    {
        delete d_bp;
        copy(other);
    }
    return *this;
}

inline void Semantic::copy(Semantic const &other)
{
    d_bp = other.d_bp ? other.d_bp->clone() : 0;
}

template <typename Class>
inline  Class const &Semantic::downcast()
{
    return dynamic_cast<Class &>(*d_bp);
}

inline std::ostream &operator<<(std::ostream &out, Semantic const &obj)
{
    if (&obj.base())
        return out << obj.base();

    return out << "<UNDEFINED>";
}

#endif

Semantic is a slightly more `complete' class than Base and its derivatives, since it contains a pointer which must be handled appropriately. So it needs a copy constructor, an overloaded assignment operator and a destructor. Apart from that, it supports members to obtain a reference to the base class. This reference is then used by the overloaded operator<<() to allow insertion into streams of objects of classes derived from Base. It also offers a small member template returning a reference to a derived class object from the semantic value's Base class pointer. This member effectively implements (and improves) the type safety that is otherwise strived for by typed nonterminals and typed tokens (using the %type directive).

23.9.3.1: The parser using a polymorphic semantic value type

In Bisonc++'s grammar specification %stype will of course be Semantic. A simple grammar is defined for this illustrative example. The grammar expects input according to the following rule:
    rule:
        IDENTIFIER '(' IDENTIFIER ')' ';'
    |
        IDENTIFIER '=' INT ';'
    ;

The rule's actions simply echo the received identifiers and int values to cout. Here is an example of a decorated production rule:

    IDENTIFIER '=' INT ';'
    {
          cout << $1 << " " << $3 << endl;
    }

Alternative actions could easily be defined, e.g., using the Base::downcast() member:

    IDENTIFIER '=' INT ';'
    {
        int value = $3.downcast<Int>().value();
    }

Bisonc++'s parser stores all semantic values on its semantic values stack (irrespective of the number of tokens that were defined in a particular production rule). At any time all semantic values associated with previously recognized tokens are available in an action block. Once the semantic value stack is reduced, the Semantic class takes care of the proper destruction of the objects pointed to by the Semantic data member d_bp.

The scanner must of course be able to access the parser's data member representing the most recent semantic value. This data member is available as the parser's data member d_val__, which can be offered to the scanner object at its construction time. E.g., with a scanner expecting an STYPE__ & the parser's constructor could simply be defined as:

    inline Parser::Parser()
    :
        d_scanner(d_val__)
    {}

23.9.3.2: The scanner using a polymorphic semantic value type

As discussed in the previous section the scanner must have access to the parser's d_val__ data member. Therefore the Scanner class may define a Semantic &d_semval member, which is initialized to Semantic d_val__ which is passed to the Scanner's constructor via the constructor's semval parameter:
    inline Scanner::Scanner(Parser::STYPE__ &semval)
    :                       // or: Semantic &semval
        d_semval(semval)
    {}

The scanner (generated by flex(1)) recognizes input patterns, returns Parser tokens (e.g., Parser::INT), and returns a semantic value when applicable. E.g., when recognizing a Parser::INT the rule is:

    {
        *d_semval = new Int(yytext);
        return Parser::INT;
    }

Note that, as the Semantic constructor expects but one argument, automatic promotion from Base * to Semantic can be used in the assignments to *d_semval.

The IDENTIFIER's semantic value is obtained as follows:

[a-zA-Z_][a-zA-Z0-9_]* { *d_semval = new Text(yytext); return Parser::IDENTIFIER; }