Chapter 8: Classes And Memory Allocation

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.

Please state the document version you're referring to, as found in the title (in this document: 8.1.0~pre2) and please state chapter and paragraph name or number you're referring to.

All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

In contrast to the set of functions that handle memory allocation in C (i.e., malloc etc.), memory allocation in C++ is handled by the operators new and delete. Important differences between malloc and new are:

A comparable relationship exists between free and delete: delete makes sure that when an object is deallocated, its destructor is automatically called.

The automatic calling of constructors and destructors when objects are created and destroyed has consequences which we shall discuss in this chapter. Many problems encountered during C program development are caused by incorrect memory allocation or memory leaks: memory is not allocated, not freed, not initialized, boundaries are overwritten, etc.. C++ does not `magically' solve these problems, but it does provide us with tools to prevent these kinds of problems.

As a consequence of malloc and friends becoming deprecated the very frequently used str... functions, like strdup, that are all malloc based, should be avoided in C++ programs. Instead, the facilities of the string class and operators new and delete should be used instead.

Memory allocation procedures influence the way classes dynamically allocating their own memory should be designed. Therefore, in this chapter these topics are discussed in addition to discussions about operators new and delete. We'll first cover the peculiarities of operators new and delete, followed by a discussion about:

8.1: Operators `new' and `delete'

C++ defines two operators to allocate memory and to return it to the ` common pool'. These operators are, respectively new and delete.

Here is a simple example illistrating their use. An int pointer variable points to memory allocated by operator new. This memory is later released by operator delete.

    int *ip = new int;
    delete ip;

Here are some characteristics of operators new and delete:

Operator new can be used to allocate primitive types but also to allocate objects. When a primitive type or a struct type without a constructor is allocated the allocated memory is not guaranteed to be initialized to 0, but an initialization expression may be provided:

    int *v1 = new int;          // not guaranteed to be initialized to 0
    int *v1 = new int();        // initialized to 0
    int *v2 = new int(3);       // initialized to 3
    int *v3 = new int(3 * *v2); // initialized to 9
When a class-type object is allocated, the arguments of its constructor (if any) are specified immediately following the type specification in the new expression and the object will be initialized according to the thus specified constructor. For example, to allocate string objects the following statements could be used:
    string *s1 = new string;            // uses the default constructor
    string *s2 = new string();          // same
    string *s3 = new string(4, ' ');    // initializes to 4 blanks.

In addition to using new to allocate memory for a single entity or an array of entities there is also a variant that allocates raw memory: operator new(sizeInBytes). Raw memory is returned as a void *. Here new allocates a block of memory for unspecified purpose. Although raw memory may consist of multiple characters it should not be interpreted as an array of characters. Since raw memory returned by new is returned as a void * its return value can be assigned to a void * variable. More often it is assigned to a char * variable, using a cast. Here is an example:

    char *chPtr = static_cast<char *>(operator new(numberOfBytes));
The use of raw memory is frequently encounted in combination with the placement new operator, discussed in section 8.1.4.

8.1.1: Allocating arrays

Operator new[] is used to allocate arrays. The generic notation new[] is used in the C++ Annotations. Actually, the number of elements to be allocated must be specified between the square brackets and it must, in turn, be prefixed by the type of the entities that must be allocated. Example:
    int *intarr = new int[20];          // allocates 20 ints
    string *stringarr = new string[10]; // allocates 10 strings.
Operator new is a different operator than operator new[]. A consequence of this difference is discussed in the next section (8.1.2).

Arrays allocated by operator new[] are called dynamic arrays. They are constructed during the execution of a program, and their lifetime may exceed the lifetime of the function in which they were created. Dynamically allocated arrays may last for as long as the program runs.

When new[] is used to allocate an array of primitive values or an array of objects, new[] must be specified with a type and an (unsigned) expression between its square brackets. The type and expression together are used by the compiler to determine the required size of the block of memory to make available. When new[] is used the array's elements are stored consecutively in memory. An array index expression may thereafter be used to access the array's individual elements: intarr[0] represents the first int value, immediately followed by intarr[1], and so on until the last element (intarr[19]). With non-class types (primitive types, struct types without constructors) the block of memory returned by operator new[] is not guaranteed to be initialized to 0.

When operator new[] is used to allocate arrays of objects their constructors are automatically used. Consequently new string[20] results in a block of 20 initialized string objects. When allocating arrays of objects the class's default constructor is used to initialize each individual object in turn. A non-default constructor cannot be called, but often it is possible to work around that as discussed in section 13.8.

The expression between brackets of operator new[] represents the number of elements of the array to allocate. The C++ standard allows allocation of 0-sized arrays. The statement new int[0] is correct C++. However, it is also pointless and confusing and should be avoided. It is pointless as it doesn't refer to any element at all, it is confusing as the returned pointer has a useless non-0 value. A pointer intending to point to an array of values should be initialized (like any pointer that isn't yet pointing to memory) to 0, allowing for expressions like if (ptr) ...

Without using operator new[], arrays of variable sizes can also be constructed as local arrays. Such arrays are not dynamic arrays and their lifetimes are restricted to the lifetime of the block in which they were defined.

Once allocated, all arrays have fixed sizes. There is no simple way to enlarge or shrink arrays. C++ has no operator ` renew'. Section 8.1.3 illustrates how an array can be enlarged.

8.1.2: Deleting arrays

Dynamically allocated arrays are deleted using operator delete[]. It expects a pointer to a block of memory, previously allocated by operator new[].

When operator delete[]'s operand is a pointer to an array of objects two actions will be performed:

Here is an example showing how to allocate and delete an array of 10 string objects:
    std::string *sp = new std::string[10];
    delete[] sp;
No special action is performed if a dynamically allocated array of primitive typed values is deleted. Following int *it = new int[10] the statement delete[] it simply returns the memory pointed at by it is returned. Realize that, as a pointer is a primitive type, deleting a dynamically allocated array of pointers to objects will not result in the proper destruction of the objects the array's elements point at. So, the following example results in a memory leak:
    string **sp = new string *[5];
    for (size_t idx = 0; idx != 5; ++idx)
        sp[idx] = new string;
    delete[] sp;            // MEMORY LEAK !
In this example the only action performed by delete][ is to return an area the size of five pointers to strings to the common pool.

Here's how the destruction in such cases should be performed:

Example:
    for (size_t idx = 0; idx != 5; ++idx)
        delete sp[idx];
    delete[] sp;
One of the consequences is of course that by the time the memory is going to be returned not only the pointer must be available but also the number of elements it contains. This can easily be accomplished by storing pointer and number of elements in a simple class and then using an object of that class.

Operator delete[] is a different operator than operator delete. The rule of thumb is: if new[] was used, also use delete[].

8.1.3: Enlarging arrays

Once allocated, all arrays have fixed sizes. There is no simple way to enlarge or shrink arrays. C++ has no renew operator. The basic steps to take when enlarging an array are the following: Static and local arrays cannot be resized. Resizing is only possible for dynamically allocated arrays. Example:
    #include <string>
    using namespace std;

    string *enlarge(string *old, unsigned oldsize, unsigned newsize)
    {
        string *tmp = new string[newsize];  // allocate larger array

        for (size_t idx = 0; idx != oldsize; ++idx)
            tmp[idx] = old[idx];            // copy old to tmp

        delete[] old;                       // delete the old array
        return tmp;                         // return new array
    }

    int main()
    {
        string *arr = new string[4];        // initially: array of 4 strings
        arr = enlarge(arr, 4, 6);           // enlarge arr to 6 elements.
    }

The procedure to enlarge shown in the example also has several drawbacks.

Depending on the context various solutions exist to improve the efficiency of this rather inefficent procedure. An array of pointers could be used (requiring only the pointers to be copied, no destruction, no superfluous initialization) or raw memory in combination with the placement new operator could be used (an array of objects remains available, no destruction, no superfluous construction).

8.1.4: The `placement new' operator

A remarkable form of operator new is called the placement new operator. Here operator new is provided with an existing block of memory in which an object or value is initialized. The block of memory should of course be large enough to contain the object, but apart from that no other requirements exist. It is easy to determine how much memory is used by en entity (object or variable) of type Type: the sizeof operator returns the number of bytes required by an Type entity. Entities may of course dynamically allocate memory for their own use. Dynamically allocated memory, however, is not part of the entity's memory `footprint' but it is always made available externally to the entity itself. This is why sizeof returns the same value when applied to different string objects returning different length and capacity values.

The placement new operator uses the following syntax (using Type to indicate the used data type):

    Type *new(void *memory) Type(arguments);
Here, memory is block of memory of at least sizeof(Type) bytes large and Type(arguments) is any constructor of the class Type.

The placement new operator is useful in situations where classes set aside memory to be used later. This is used, e.g., by std::string to change its capacity. Calling string::reserve may enlarge that capacity without making memory beyond the string's length immediately available. But the object itself may access its additional memory and so when information is added to a string object it can draw memory from its capacity rather than having to perform a reallocation for each single addition of information.

Let's apply that philosophy to a class Strings storing std::string objects. The class defines a char *d_memory accessing the memory holding its d_size string objects as well as d_capacity - d_size reserved memory. Assuming that a default constructor initializes d_capacity to 1, doubling d_capacity whenever an additional string must be stored, the class must support the following essential operations:

To double the capacity new memory is allocated, old memory is copied into the newly allocated memory, and the old memory is deleted. This is implemented by the member void Strings::reserve, assuming d_capacity has already been given its proper value:
void Strings::reserve()
{
    char *newMemory =
        static_cast<char *>(memcpy(
                                operator new(d_capacity),
                                d_memory,
                                d_size * sizeof(std::string)
                            ));

    delete d_memory;
    d_memory = newMemory;
}

The raw memory is made available by operator new(sizeInBytes). This should not be interpreted as an array of any kind, so a plain delete d_memory is used to return the previously allocated block of raw memory.

The member append adds another string object to a Strings object. A (public) member reserve(request) ensures that the String object's capacity is sufficient. Then the placement new operator is used to install the next string into the raw memory's appropriate location:

void Strings::append(std::string const &next)
{
    reserve(d_size + 1);

    new (reinterpret_cast<std::string *>(d_memory) + d_size)
        std::string(next);

    ++d_size;
}

At the end of the String object's lifetime all its dynamically allocated memory must be returned. This is the responsibility of the destructor, as explained in the next section. The destructor's full definition is postponed to that section, but its actions when placement new is involved can be discussed here.

With placement new an interesting situation is encountered. Objects, possibly themselves allocating memory, are installed in memory that may or may not have been allocated dynamically, but that is definitely not completely filled with such objects. So a simple delete[] can't be used, but a delete for each of the objects that are available can't be used either, since that would also delete the memory of the objects themselves, which wasn't dynamically allocated.

This peculiar situation is solved in a peculiar way, only encountered in cases where the placement new operator has been used: memory allocated by objects initialized using placement new is returned by explicitly calling the object's destructor. The destructor is declared as a member having the class preceded by a tilde as its name, not using any arguments. So, std::string's destructor is named ~string. The memory allocated by our class Strings is therefore properly destroyed as follows (in the example assume that using namespace std was specfied):

    for
    (
        string *sp = reinterpret_cast<string *>(d_memory) + d_size;
            sp-- != reinterpret_cast<string *>(d_memory);
    )
        sp->~string();

    delete d_memory;

So far, so good. All is well as long as we're using but one object. What about allocating an array of objects? Initialization is performed as usual. But as with delete, delete[] cannot be called when the buffer was allocated statically. Instead, when multiple objects were initialized using the placement new operator in combination with a statically allocated buffer all the objects' destructors must be called explicitly, as in the following example:

    char buffer[3 * sizeof(string)];
    string *sp = new(buffer) string [3];

    for (size_t idx = 0; idx < 3; ++idx)
        sp[idx].~string();

8.2: The destructor

Comparable to the constructor, classes may define a destructor. This function is the constructor's counterpart in the sense that it is invoked when an object ceases to exist. A destructor is usually called automatically, but that's not always true. The destructors of dynamically allocated objects are not automatically activated, but in addition to that: when a program is interrupted by an exit call, only the destructors of already initialized global objects are called. In that situation destructors of objects defined locally by functions are also not called. This is one (good) reason for avoiding exit in C++ programs.

Destructors obey the following syntactical requirements:

Destructors are declared in their class interfaces. Example:
    class StringStore
    {
        public:
            StringStore();
            ~StringStore();     // the destructor
    };
By convention the constructors are declared first. The destructor is declared next, to be followed by other member functions.

A destructor's main task is to ensure that memory allocated by an object is properly returned when the object ceases to exist. Consider the following interface of the class StringStore:

    class StringStore
    {
        std::string *d_string;
        size_t d_size;

        public:
            StringStore();
            StringStore(char const *const *cStrings, size_t n);
            ~StringStore();

            std::string const &at(size_t idx) const;
            size_t size() const;
    };

The constructor's task is to initialize the data fields of the object. E.g, its constructors are defined as follows:

    StringStore::StringStore()
    :
        d_string(0),
        d_size(0)
    {}

    StringStore::StringStore(char const *const *cStrings, size_t size)
    :
        d_string(new string[size]),
        d_size(size)
    {
        for (size_t idx = 0; idx != size; ++idx)
            d_string[idx] = cStrings[idx];
    }

As objects of the class StringStore allocate memory a destructor is clearly required. Destructors may or may not be called automatically. Here are the rules:

The destructor's task is to ensure that all memory that is dynamically allocated and controlled only by the object itself is returned. The task of the StringStore's destructor would therefore be to delete the memory to which d_string points. Its implementation is:
    StringStore::~StringStore()
    {
        delete[] d_string;
    }

The next example shows StringStore at work. In process a StringStore store is created, and its data are displayed. It returns a dynamically allocated StringStore object to main. A StringStore * receives the address of the allocated object and deletes the object again. Another StringStore object is then created in a block of memory made available locally in main, and an explicit call to ~StringStore is required to return the memory allocated by that object. In the example only once a StringStore object is automatically destroyed: the local StringStore object defined by display. The other two StringStore objects require explicit actions to prevent memory leaks.

    #include "stringstore.h"
    #include <iostream>

    using namespace std;;

    void display(StringStore const &store)
    {
        for (size_t idx = 0; idx != store.size(); ++idx)
            cout << store.at(idx) << '\n';
    }

    StringStore *process(char *argv[], int argc)
    {
        StringStore store(argv, argc);
        display(store);
        return new StringStore(argv, argc);
    }

    int main(int argc, char *argv[])
    {
        StringStore *sp = process(argv, argc);
        delete sp;

        char buffer[sizeof(StringStore)];
        sp = new (buffer) StringStore(argv, argc);
        sp->~StringStore();
    }

8.2.1: Object pointers revisited

Operators new and delete are used when an object or variable is allocated. One of the advantages of the operators new and delete over functions like malloc and free is that new and delete call the corresponding object constructors and destructors.

The allocation of an object by operator new is a two-step process. First the memory for the object itself is allocated. Then its constructor is called, initializing the object. Aanalogously to the construction of an object, the destruction is also a two-step process: first, the destructor of the class is called deleting the memory controlled by the object. Then the memory used by the object itself is freed.

Dynamically allocated arrays of objects can also be handled by new and delete. When allocating an array of objects using operator new the default constructor is called for each object in the array. In cases like this operator delete[] must be used to ensure that the destructor is called for each of the objects in array.

Hoewever, the addresses returned by new Type and new Type[size] are of identical types, in both cases a Type *. Consequently it cannot be determined by the type of the pointer whether a pointer to dynamically allocated memory points to a single entity or to an array of entities.

What happens if delete rather than delete[] is used? Consider the following situation, in which the destructor ~StringStore is modified so that it tells us that it is called. In a main function an array of two StringStore objects is allocated by new, to be deleted by delete []. Next, the same actions are repeated, albeit that the delete operator is called without []:

    #include <iostream>
    #include "stringstore.h"
    using namespace std;

    StringStore::~StringStore()
    {
        cout << "StringStore destructor called" << '\n';
    }

    int main()
    {
        StringStore *a  = new StringStore[2];

        cout << "Destruction with []'s" << '\n';
        delete[] a;

        a = new StringStore[2];

        cout << "Destruction without []'s" << '\n';
        delete a;
    }
/*
    Generated output:
Destruction with []'s
StringStore destructor called
StringStore destructor called
Destruction without []'s
StringStore destructor called
*/
From the generated output, we see that the destructors of the individual StringStore objects are called when delete[] is used, while only the first object's destructor is called if the [] is omitted.

Conversely, if delete[] is called in a situation where delete should have been called the results are unpredicable, and will most likely cause the program to crash. This problematic behavior is caused by the way the run-time system stores information about the size of the allocated array (usually right before the array's first element). If a single object is allocated the array-specific information is not available, but it is nevertheless assumed present by delete[]. This latter operator will interpret bogus values before the array's first element as size information, thus usually causing the program to fail.

If no destructor is defined, a trivial destructor is defined by the compiler. The trivial destructor ensures that the destructors of composed objects (as well as the destructors of base classes if a class is a derived class, cf. chapter 13) are called. This has serious implications: objects allocating memory will cause a memory leak unless precautionary measures are taken (by defining an appropriate destructor). Consider the following program:

    #include <iostream>
    #include "stringstore.h"
    using namespace std;

    StringStore::~StringStore()
    {
        cout << "StringStore destructor called" << '\n';
    }

    int main()
    {
        StringStore **ptr = new StringStore* [2];

        ptr[0] = new StringStore[2];
        ptr[1] = new StringStore[2];

        delete[] ptr;
    }
This program produces no output at all. Why is this? The variable ptr is defined as a pointer to a pointer. The dynamically allocated array therefore consists of pointer variables and pointers are of a primitive type. No destructors exist for primitive typed variables. Consequently only the array itself is returned, and no StringStore destructor is called.

Of course, we don't want this, but require the StringStore objects pointed to by the elements of a to be deleted too. In this case we have two options:

8.2.2: The function set_new_handler()

The C++ run-time system ensures that when memory allocation fails an error function is activated. By default this function throws a bad_alloc exception (see section 9.8), terminating the program. Therefore it is not necessary to check the return value of operator new. Operator new's default behavior may be modified in various ways. One way to modify its behavior is to redefine the function that's called when memory allocation fails. Such a function must comply with the following requirements:

A redefined error function might, e.g., print a message and terminate the program. The user-written error function becomes part of the allocation system through the function set_new_handler.

Such an error function is illustrated below ( This implementation applies to the Gnu C/C++ requirements. Actually using the program given in the next example is not advised, as it will probably slow down your computer enormously due to the resulting use of the operating system's swap area.):

    #include <iostream>
    #include <string>
    using namespace std;

    void outOfMemory()
    {
        cout << "Memory exhausted. Program terminates." << '\n';
        exit(1);
    }

    int main()
    {
        long allocated = 0;

        set_new_handler(outOfMemory);       // install error function

        while (true)                        // eat up all memory
        {
            memset(new int [100000], 0, 100000 * sizeof(int));
            allocated += 100000 * sizeof(int);
            cout << "Allocated " << allocated << " bytes\n";
        }
    }
Once the new error function has been installed it is automatically invoked when memory allocation fails, and the program is terminated. Memory allocation may fail in indirectly called code as well, e.g., when constructing or using streams or when strings are duplicated by low-level functions.

So far for the theory. On some systems the ` out of memory' condition may actually never be reached, as the operating system may interfere before the run-time sypport system gets a chance to stop the program (see also this link).

The standard C functions allocating memory (like strdup, malloc, realloc etc.) do not trigger the new handler when memory allocation fails and should be avoided in C++ programs.

8.3: The assignment operator

In C++ struct and class type objects can be directly assigned new values in the same way as this is possible in C. The default action of such an assignment for non-class type data members is a straight byte-by-byte copy from one data member to another. For now we'll use the following simple class Person:
    class Person
    {
        char *d_name;
        char *d_address;
        char *d_phone;

        public:
            Person();
            Person(char const *name, char const *addr, char const *phone);
            ~Person();
        private:
            char *strdupnew(char const *src);   // returns a copy of src.
    };
Person's data members are initialized to zeroes or to copies of the ASCII-Z strings passed to Person's constructor, using some variant of strdup. Its destructor will return the allocated memory again.

Now consider the consequences of using Person objects in the following example:

    void tmpPerson(Person const &person)
    {
        Person tmp;
        tmp = person;
    }
Here's what happens when tmpPerson is called: Now a potentially dangerous situation has been created. The actual values in person are pointers, pointing to allocated memory. After the assignment this memory is addressed by two objects: person and tmp. This problematic assignment is illustrated in Figure 4.

Figure 4 is shown here.
Figure 4: Private data and public interface functions of the class Person, using byte-by-byte assignment


Having executed tmpPerson, the object referenced by person now contains pointers to deleted memory.

This is undoubtedly not a desired effect of using a function like tmpPerson. The deleted memory will likely be reused by subsequent allocations. The pointer members of person have effectively become wild pointers, as they don't point to allocated memory anymore. In general it can be concluded that

every class containing pointer data members is a potential candidate for trouble.
Fortunately, it is possible to prevent these troubles, as discussed next.

8.3.1: Overloading the assignment operator

Obviously, the right way to assign one Person object to another, is not to copy the contents of the object bytewise. A better way is to make an equivalent object. One having its own allocated memory containing copies of the original strings.

The way to assign a Person object to another is illustrated in Figure 5.

Figure 5 is shown here.
Figure 5: Private data and public interface functions of the class Person, using the `correct' assignment.


There are several ways to assign a Person object to another. One way would be to define a special member function to handle the assignment. The purpose of this member function would be to create a copy of an object having its own name, address and phone strings. Such a member function could be:

    void Person::assign(Person const &other)
    {
            // delete our own previously used memory
        delete[] d_name;
        delete[] d_address;
        delete[] d_phone;

            // copy the other Person's data
        d_name    = strdupnew(other.d_name);
        d_address = strdupnew(other.d_address);
        d_phone   = strdupnew(other.d_phone);
    }
Using assign we could rewrite the offending function tmpPerson:
    void tmpPerson(Person const &person)
    {
        Person tmp;

            // tmp (having its own memory) holds a copy of person 
        tmp.assign(person);

            // now it doesn't matter that tmp is destroyed..
    }
This solution is valid, although it only solves a symptom solution. It requires the programmer to use a specific member function instead of the assignment operator. The original problem (assignment produces wild pointers) is still not solved. Since it is hard to `strictly adhere to a rule' a way to solve the original problem is of course preferred.

Fortunately a solution exists using operator overloading: the possibility C++ offers to redefine the actions of an operator in a given context. Operator overloading was briefly mentioned earlier, when the operators << and >> were redefined to be used with streams (like cin, cout and cerr), see section 3.1.4.

Overloading the assignment operator is probably the most common form of operator overloading in C++. A word of warning is appropriate, though. The fact that C++ allows operator overloading does not mean that this feature should indiscriminately be used. Here's what you should keep in mind:

An operator should simply do what it is designed to do. The phrase that's often encountered in the context of operator overloading is do as the ints do. The way operators behave when applied to ints is what is expected, all other implementations probably cause surprises and confusion. Therefore, overloading the insertion (<<) and extraction (>>) operators in the context of streams is probably ill-chosen: the stream operations have nothing in common with bitwise shift operations.

8.3.1.1: The member 'operator=()'

To add operator overloading to a class, the class interface is simply provided with a (usually public) member function naming the particular operator. That member function is thereupon implemented.

To overload the assignment operator =, a member operator=(Class const &rhs) is added to the class interface. Note that the function name consists of two parts: the keyword operator, followed by the operator itself. When we augment a class interface with a member function operator=, then that operator is redefined for the class, which prevents the default operator from being used. In the previous section the function assign was provided to solve the problems resulting from using the default assignment operator. Rather than using an ordinary member function C++ commonly uses a dedicated operator generalizing the operator's default behavior to the class in which it is defined.

The assign member mentioned before may be redefined as follows (the member operator= presented below is a first, rather unsophisticated, version of the overloaded assignment operator. It will shortly be improved):

    class Person
    {
        public:                             // extension of the class Person
                                            // earlier members are assumed.
            void operator=(Person const &other);
    };
Its implementation could be
    void Person::operator=(Person const &other)
    {
        delete[] d_name;                      // delete old data
        delete[] d_address;
        delete[] d_phone;

        d_name = strdupnew(other.d_name);   // duplicate other's data
        d_address = strdupnew(other.d_address);
        d_phone = strdupnew(other.d_phone);
    }
This member's actions are similar to those of the previously mentioned member assign, but this member is automatically called when the assignment operator = is used. Actually there are two ways to call overloaded operators as shown in the next example:
    void tmpPerson(Person const &person)
    {
        Person tmp;

        tmp = person;       
        tmp.operator=(person);  // the same thing
    }
Overloaded operators are seldom called explicitly, but an explicit call is required when the overloaded operator must be called from a pointer to an object:
    void tmpPerson(Person const &person)
    {
        Person *tmp = new Person;

        tmp->operator=(person);
        *tmp = person;          // yes, also possible...

        delete tmp;
    }

8.4: The `this' pointer

A member function of a given class is always called in combination with an object of its class. There is always an implicit `substrate' for the function to act on. C++ defines a keyword, this, to reach this substrate.

The this keyword is a pointer variable that always contains the address of the object for which the member function was called. The this pointer is implicitly declared by each member function (whether public, protected, or private). The this ponter is a constant pointer to an object of the member function's class. For example, the members of the class Person implicitly declare:

    extern Person *const this;
A member function like Person::name could be implemented in two ways: with or without using the this pointer:
    char const *Person::name() const    // implicitly using `this'
    {
        return d_name;
    }

    char const *Person::name() const    // explicitly using `this'
    {
        return this->d_name;
    }
The this pointer is seldom explicitly used, but situations do exist where the this pointer is actually required (cf. chapter 16).

8.4.1: Sequential assignments and this

C++'s syntax allows for sequential assignments, with the assignment operator associating from right to left. In statements like:
    a = b = c;
the expression b = c is evaluated first, and its result in turn is assigned to a.

The implementation of the overloaded assignment operator we've encountered thus far does not permit such constructions, as it returns void.

This imperfection can easily be remedied using the this pointer. The overloaded assignment operator expects a reference to an object of its class. It can also return a reference to an object of its class. This reference can then be used as an argument in sequential assignments.

The overloaded assignment operator commonly returns a reference to the current object (i.e., *this). The next version of the overloaded assignment operator for the class Person thus becomes:

    Person &Person::operator=(Person const &other)
    {
        delete[] d_address;
        delete[] d_name;
        delete[] d_phone;

        d_address = strdupnew(other.d_address);
        d_name = strdupnew(other.d_name);
        d_phone = strdupnew(other.d_phone);

        // return current object as a reference
        return *this;
    }
Overloaded operators may themselves be overloaded. Consider the string class, having overloaded assignment operators operator=(std::string const &rhs), operator=(char const *rhs), and several more overloaded versions. These additional overloaded versions are there to handle different situations which are, as usual, recognized by their argument types. These overloaded versions all follow the same mold: when necessary dynamically allocated memory controlled by the object is deleted; new values are assigned using the overloaded operator's parameter values and *this is returned.

8.5: The copy constructor: initialization vs. assignment

Consider the class StringStore, introduced in section 8.2, once again. As it contains several primitive type data members as well as a pointer to dynamically allocated memory it needs a constructor, a destructor, and an overloaded assignment operator. In fact the class offers two constructors: in addition to the default constructor it offers a a constructor expecting a char const *const * and a size_t.

Now consider the following code fragment. The statement references are discussed following the example:

    int main(int argc, char **argv)
    {
        StringStore s1(argv, argc);     // (1)
        StringStore s2;                 // (2)
        StringStore s3(s1);             // (3)

        s2 = s1;                        // (4)
    }
In the above example three objects where defined, each using a different constructor. The actually used constructor was deduced from the constructor's argument list.

The copy constructor encountered here is new. It does not result in a compilation error even though it hasn't been declared in the class interface. This takes us to the following rule:

A copy constructor is always available, even if it isn't declared in the class's interface.
The copy constructor made available by the compiler is also called the trivial copy constructor. Starting with the C++0x standard it can easily be suppressed (using the = delete idiom). The trivial copy constructor performs a byte-wise copy operation of the existing object's primitive data to the newly created object, calls copy constructors to intialize the object's class data members from their counterparts in the existing object and, when inheritance is used, calls the copy constructors of the base class(es) to initialize the new object's base classes.

Consequently, in the above example the trivial copy constructor is used. As it performs a byte-by-byte copy operation of the object's primitive type data members that is exactly what happens at stattement 3. By the time s2 ceases to exist its destructor will delete its array of strings. Unfortunately d_string is of a primitive data type and so it also deletes s1's data. Once again we encounter wild pointers as a result of an object going out of scope.

The remedy is easy: instead of using the trivial copy constructor a copy constructor must explicitly be added to the class's interface and its definition must prevent the wild pointers, comparably to the way this was realized in the overloaded assignment operator. An object's dynamically allocated memory is duplicated, so that it will contain its own allocated data. The copy constructor is simpler than the overloaded assignment operator in that it doesn't have to delete previously allocated memory. Since the object is going to be created no previously allocated memory already exists.

StringStore's copy constructor can be implemented as follows:

    StringStore::StringStore(StringStore const &other)
    :
        d_string(new string[other.d_size]),
        d_size(other.d_size)
    {
        for (size_t idx = 0; idx != d_size; ++idx)
            d_string[idx] = other.d_string[idx];
    }

The copy constructor is always called when an object is initialized using another object of its class. Apart from the plain copy construction that we encountered thus far, here are other situations where the copy constructor is used:

Here store is used to initialize copy's return value. The returned StringStore object is a temporary, anonymous object that may be immediately used by code calling copy but no assumptions can be made about its lifetime thereafter.

8.5.1: Revising 'operator=()'

The overloaded assignment operator has characteristics also encountered with the copy constructor and the destructor: The copy constructor and the destructor clearly are required. If the overloaded assignment operator also needs to return allocated memory and to assign new values to its data members couldn't the destructor and copy constructor be used for that?

As we've seen in our discussion of the destructor (section 8.2) the destructor can explicitly be called, but that doesn't hold true for the (copy) constructor. But let's briefly summarize what an overloaded assignment operator is supposed to do:

The second part surely looks like a copy construction. Copy construction becomes even more attractive after realizing that the copy constructor also initializes any reference data members the class might have. Realizing the copy construction part is easy: just define a local object and initialize it using the assignment operator's const reference parameter, like this:
    Strings &operator=(Strings const &other)
    {
        Strings tmp(other);
        // more to follow
        return *this;
    }
The optimization operator=(String tmp) is enticing, but let's postpone that for a little while (at least until section 8.6).

Now that we've done the copying part, what about the deleting part? And isn't there another slight problem as well? After all we copied all right, but not into our intended (current, *this) object.

At this point it's time to introduce swapping. Swapping two variables means that the two variables exchange their values. Many classes (e.g., std::string) offer swap members allowing us to swap two of their objects. The Standard Template Library (STL, cf. chapter 18) offers various functions related to swappping. There is even a swap generic algorithm (cf. section 19.1.61). That latter algorithm, however, begs the current question, as it is customarily implemented using the assignment operator, so it's somewhat problematic to use it when implementing the assignment operator.

As we've seen with the placement new operator objects can be constructed in blocks of memory of sizeof(Class) bytes large. And so, two objects of the same class each occupy sizeof(Class) bytes. To swap these objects we merely have to swap the contents of those sizeof(Class) bytes. This procedure may be applied to classes whose objects may be swapped using a member-by-member swapping operation and can also be used for classes having reference data members. Here is its implementation for a hypothetical class Class, resulting in very fast swapping:

    #include <cstring>
    
    void Class::swap(Class &other)
    {
        char buffer[sizeof(Class)];
        memcpy(buffer, &other, sizeof(Class));
        memcpy(&other, this,   sizeof(Class));
        memcpy(this,   buffer, sizeof(Class));
    }
Let's add void swap(Strings &other) to the class Strings and complete its operator= implementation:
    Strings &operator=(Strings const &other)
    {
        Strings tmp(other);
        swap(tmp);
        return *this;
    }
This operator= implementation is generic: it can be applied to every class whose objects are directly swappable. How does it work? Nice?

8.6: The move constructor (C++0x)

Before the advent of the C++0x standard C++ offered basically two ways to assign the information pointed to by a data member of a temporary object to an lvalue object. Either a copy constructor or reference counting had to be used. The C++0x standard adds move semantics to these two, allowing transfer of the data pointed to by a temporary object to its destination.

Our class Strings has, among other members a data member string *d_string. Clearly, Strings should define a copy constructor, a destructor and an overloaded assignment operator.

Now design a function loadStrings(std::istream &in) extracting the strings of a Strings object from in. As the Strings object doesn't exit yet, the String object filled by loadStrings is returned by value. The function loadStrings returns a temporary object, which is then used to initialize an external Strings object:

    Strings loadStrings(std::istream &in)
    {
        Strings ret;
        // load the strings into 'ret'
        return ret;
    }
    // usage:
    Strings store(loadStrings(cin));
In this example two full copies of a Strings object are required:

The rvalue reference concept allows us to improve this procedure. An rvalue reference binds to an anonymous temporary (r)value and the compiler is required to do so whenever possible. We, as programmers, must inform the compiler in what situations rvalue references can be handled. We do this by providing overloaded members defining rvalue reference parameters.

One such overloaded member is the move constructor. The move constructor is a constructor defining an rvalue reference to an object of its own class as parameter. Here is the declaration of the Strings class move constructor:

    Strings(Strings &&tmp);
Move constructors are allowed to simply assign the values of pointer data members to their own pointer data members without requiring them to make a copy of the source's data first. Having done so the temporary's pointer value is set to zero to prevent its destructor from destroying data now owned by the just constructed object. The move constructor has grabbed or stolen the data from the temporary object. This is OK as the temporary cannot be referred to again (as it is anonymous, it cannot be accessed by other code) and ceases to exist shortly after the constructor's call anyway. Here is the implementation of Strings move constructor:
    Strings::Strings(Strings &&tmp)
    :
        d_memory(tmp.d_memory),
        d_size(tmp.d_size),
        d_capacity(tmp.d_capacity)
    {
        tmp.d_memory = 0;
    }

Once a class becomes a move-aware class this awareness must extend to its destructor as well. With Strings this is not an issue as its destructor only executes delete[] d_string, but it becomes an issue in classes using, e.g., pointers to pointer data members. Their destructors must visit each of the array's pointers to delete the objects pointed to by the array's elements. Assuming that Strings d_string data member was defined as string **d_string the implementation of String's move constructor may remain as-is, but the destructor must now inspect d_string to prevent the destruction loop from executing when it is zero:

    Strings::~Strings()
    {
        if (d_string == 0)
            return;
        for (string **end = d_string + d_size; end-- != d_string; )
            delete *end;
        delete[] d_string;
    }

In addition to the move constructor other members defining Class const & parameters may also be overloaded with members expecting Class && parameters. Here too the compiler will select these latter overloads if an anonymous temporary argument is provided. Let's consider the implications for a minute using the next example, assuming Class offers a move constructor and a copy constructor:

    Class factory();

    void fun(Class const &other);   // a
    void fun(Class &&tmp);          // b

    vold callee(Class &&tmp);
    {
        Class object(factory());    // 1
        Class object2(object);      // 2
        fun(object);                // 3
        fun(factory());             // 4
        fun(tmp);                   // 5
    }

    int main()
    {
        callee(factory());
    }
Realizing that fun(tmp) might be called twice the compiler's choice is understandable. If tmp's data would have been grabbed at the first call, the second call would receive tmp without any data. But at the last call we might know that tmp is never used again and so we might like to ensure that fun(Class &&) is called. This can be realized by the following cast:
    fun(reinterpret_cast<Class &&>(tmp)); // last call!
More often, though the shorthand fun(std::move(tmp)) is used, already performing the required cast for us. Std::move is indirectly declared by many header files. If no header is already declaring std::move then include utility.

It is pointless to provide a function with an rvalue reference return type. The compiler decides whether or not to use an overloaded member expecting an rvalue reference on the basis of the provided argument: if it is an anonymous temporary it will call the overloaded member defining the rvalue refeerence parameter.

Classes not using pointer members pointing to memory controlled by its objects (and not having base classes doing so, see chapter 13) do not benefit from overloaded members expecting rvalue references.

The compiler, when selecting a function to call applies a fairly simple algorithm, and also considers copy elision. This is covered shortly (section 8.7).

8.6.1: Move-only classes (C++0x)

Classes may very well allow move semantics without offering copy semantics. Most stream classes belong to this category. Extending their definition with move semantics greatly enhances their usability. Once move semantics becomes available for such classes, so called factory functions (functions returning an object constructed by the function) can easily be implemented. E.g.,
    // assume char *filename
    ifstream inStream(openIstream(filename));
For this example to work an ifstream constructor must offer a move constructor. This way there will at any time be only one object referring to the open istream.

Once classes offer move semantics their objects can also safely be stored in standard containers. When such containers performs reallocation (e.g., when their sizes are enlarged) they will use the object's move constructors rather than their copy constructors. As move-only classes suppress copy semantics containers storing objects of move-only classes implement the correct behavior in that it is impossible to assign such containers to each other.

8.7: Copy Elision and Return Value Optimization

When the compiler selects a member function (or constructor) it will do so according to a simple set of rules, matching arguments with parameter types.

Below two tables are provided. The first table should be used in cases where a function argument has a name, the second table should be used in cases where the argument is anonymous. In each table select the const or non-const column and then use the topmost overloaded function that is available having the specified parameter type.

The tables do not handle functions defining value parameters. If a function has overloads expecting, respectively, a value parameter and some form of reference parameter the compiler reports an ambiguity when such a function is called. In the following selection procedure we may assume, without loss of generality, that this ambiguity does not occur and that all parameter types are reference parameters.

Parameter types matching a function's argument of type T if the argument is:

The tables show that eventually all arguments can be used with a function specifying a T const & parameter. For anonymous arguments a similar catch all is available having a higher priority: T const && matches all anonymous arguments. Thus, if named and anonymous arguments are to be distinguished an T const && overloaded function will catch all temporaries.

As we've seen the move constructor grabs the information from a temporary for its own use. That is OK as the temporary is going to be destroyed after that anyway. It also means that the temporary's data members are modified. This modification can safely be considered a non-mutating operation on the temporary. It may thus be modified even if it was passed to a function specifying a T const && parameter. In cases like these consider using a const_cast to cast away the const-ness of the rvalue reference. The Strings move constructor encountered before might therefore also have been implemented as follows, handling both Strings and Strings const anonymous temporaries:

    Strings::Strings(Strings const &&tmp)
    :
        d_string(tmp.d_string),
        d_size(tmp.d_size)
    {
        const_cast<Strings &>(tmp).d_string = 0;
    }
Having defined appropriate copy and/or move constructors it may be somewhat surprising to learn that the compiler may decide to stay clear of a copy or move operation. After all making no copy and not moving is more efficient than copying or moving.

The option the compiler has to avoid making copies (or perform move operations) is called copy elision or return value optimization. In all situations where copy or move constructions are appropriate the compiler may apply copy elision. Here are the rules. In sequence the compiler considers the following options, stopping once an option can be selected:

All modern compilers apply copy elision. Here are some examples where it may be encountered:
    class Elide;

    Elide fun()         // 1
    {
        Elide ret;
        return ret;
    }

    void gun(Elide par);

    Elide elide(fun()); // 2
    
    gun(fun());         // 3

8.8: Plain Old Data (C++0x)

C++ inherited the struct concept from C and extended it with the class concept. Structs are still used in C++, mainly to store and pass around aggregates of different data types. A commonly term for these structs is plain old data ( pod).

The standard pod concept in C++ completely matches C's struct concept. The C++0x standard, however, relaxes these requirements to some extent. In the C++0x standard pod is considerd to be a class or struct having the following characteristics:

A standard-layout class or struct

8.9: Conclusion

Four important extensions to classes were introduced in this chapter: the destructor, the copy constructor, the move constructor and the overloaded assignment operator. In addition the importance of swapping, especially in combination with the overloaded assignment operator, was stressed.

Classes having pointer data members, pointing to dynamically allocated memory controlled by the objects of those classes, are potential sources of memory leaks. The extensions introduced in this chapter implement the standard defense against such memory leaks.

Encapsulation (data hiding) allows us to ensure that the object's data integrity is maintained. The automatic activation of constructors and destructors greatly enhance our capabilities to ensure the data integrity of objects doing dynamic memory allocation.

A simple conclusion is therefore that classes whose objects allocate memory controlled by themselves must at least implement a destructor, an overloaded assignment operator and a copy constructor. Implementing a move constructor remains optional, but it allows us to use factory functions with classes not allowing copy construction and/or assignment.

In the end, assuming the availability of at least a copy or move constructor, the compiler might avoid them using copy elision. Copy elision is optional may be used by the compiler in all situations where otherwise a copy or move constructor would have been used.