Skip to content
 

C++ Tutorial (3)

This is part 3 of a fast paced C++ tutorial for programmers familiar with high level languages like Perl and Python.

User-defined classes im maps

The maps in the previous tutorial contained only pairs of basic types like int or std::string. In this section, we’ll see how to store arbitrary objects of user-defined classes in maps (and other STL containers).

Lifetime issues

Before we dive right in, we need to talk a little about lifetime issues of C++ objects. In most interpreted languages, you don’t have to care about lifetime of objects, because memory allocation and deallocation happen automagically. Let’s look at a typical example in Python:

#!/usr/bin/env python
# tut03_01.py -- objects in dictionaries

class Employee(object):
    "A typical employee."

    def __init__(self, name="N/A", salary=1000.0):
        self.name = name
        self.salary = salary

if __name__ == '__main__':
    e1 = Employee("e1", 2000.0)
    e2 = Employee("e2", 1800.0)
    print "e1:", e1
    print "e2:", e2

    aDict = {}
    aDict["e1"] = e1
    aDict["e2"] = e2
    print aDict

Running this program, we get:

% python tut03_01.py
e1: <__main__.Employee object at 0x800ec7750>
e2: <__main__.Employee object at 0x800ec79d0>
{'e1': <__main__.Employee object at 0x800ec7750>,
 'e2': <__main__.Employee object at 0x800ec79d0>}

If you pay close attention to the pointers to e1 and e2, you’ll notice that the values in the dictionary point to the very same objects (which is fine). This way, you can change e1 via the dictionary like this:

aDict["e1"].salary += 100

If we now print e1.salary, we get 2100.0: we’ve just modified e1.

What happens under the hood is that Python stores Employee objects in an internal data structure. Each time we assign such an object to a variable (like e1), a smart pointer to that internal data structure is stored into that variable. Ditto for dictionaries and other containers. The smart pointer maintains a reference count, so that the internal data structure will only be destroyed, when all pointers that point to it have been destroyed. In Python (and most other interpreted languages like Ruby, Perl, PHP, Java, …), variables never contain the objects themselves, but pointers to them!

Not so in C++! If we store an object in a variable, or in an STL container like std::map, std::vector, std::list etc., a copy of that object is made, and that copy will ultimately be stored.

To see this in action, let’s reimplement Employee in C++:

// tut03_01.h -- a sample Employee class

#ifndef TUT03_01_H
#define TUT03_01_H

#include <string>
#include <istream>
#include <ostream>
#include <iostream>

const float DEFAULT_SALARY = 1000.0;

class Employee
{
 public:

  Employee():
   name_("N/A"), salary_(DEFAULT_SALARY) {
     std::cerr << "Employee() called"
               << " -- this=" << this << ". [default ctor]" << std::endl;
  }

  Employee(const std::string name, const float salary) :
   name_(name), salary_(salary) {
     std::cerr << "Employee(" << name << "," << salary << ") called"
               << " -- this=" << this << ". [normal ctor]" << std::endl;
  }

  Employee(const Employee &copy) {
    name_ = copy.name_;
    salary_ = copy.salary_;
    std::cerr << "Employee(" << &copy << ") called"
              << " -- this=" << this << ". [copy ctor]" << std::endl;
  }

  ~Employee() {
    std::cerr << "~Employee([" << this << "]) called" << std::endl;
  }

  Employee& operator=(const Employee &rhs) {
    name_ = rhs.name_;
    salary_ = rhs.salary_;
    std::cerr << "operator=(" << &rhs << ") called"
              << " -- this=" << this << ". [assign operator]" << std::endl;
    return *this;
  }

  std::string getName() const { return name_; }
  float getSalary() const { return salary_; }
  void setName(const std::string newName) { name_ = newName; }
  void setSalary(const float newSalary) { salary_ = newSalary; }

  friend std::ostream& operator<<(std::ostream &ostr, const Employee &emp);
  friend std::istream& operator>>(std::istream &istr, Employee &emp);

 private:
  std::string name_;
  float salary_;
};

std::ostream &operator<<(std::ostream &ostr, const Employee &emp)
{
  return ostr << "Employee(" << emp.name_ << ","
              << emp.salary_ << "), this=" << &emp;
}

std::istream &operator>>(std::istream &istr, Employee &emp)
{
  // We first read the salary, then the name!
  istr >> emp.salary_;
  std::getline(istr, emp.name_);
  return istr;
}

#endif // TUT03_01_H

We’ll see shortly what all those constructors, operator= and member functions are for.

A simple test program? Okay, here we go:

// tut03_01.cpp -- testing the Employee class

#include "tut03_01.h"

const char *separator = "---------------------------------------------------";

int
main (int argc, char *argv[])
{
  // Calling the default constructor Employee()
  std::cout << "CALLING Employee e0;" << std::endl;
  Employee e0;
  std::cout << "e0 is " << e0 << std::endl << separator << std::endl;

  // Calling the constructor Employee(std::string, float)
  std::cout << "CALLING Employee e1(\"e1\", 1200.0);" << std::endl;
  Employee e1("e1", 1200.0);
  std::cout << "e1 is " << e1 << std::endl << separator << std::endl;

  // Calling the copy constructor
  std::cout << "CALLING Employee e2(e1); // and changing name to e2"
            << std::endl;
  Employee e2(e1);
  e2.setName("e2");
  std::cout << "e2 is now " << e2 << std::endl << separator << std::endl;

  // Testing the assignment operator by calling e2 = e0;
  std::cout << "CALLING e2 = e0;" << std::endl;
  e2 = e0;
  std::cout << "e2 is now " << e2 << std::endl << separator << std::endl;

  // Testing the dynamic case (new)
  std::cout << "CALLING Employee *empPtr = new Employee(\"e_new\", 8000.0);"
            << std::endl;
  Employee *empPtr = new Employee("e_new", 8000);
  std::cout << "*empPtr is " << *empPtr << std::endl << separator << std::endl;

  // Explicitly deleting empPtr:
  std::cout << "CALLING delete empPtr;" << std::endl;
  delete empPtr;

  // Here, e2, e1, e0 will be destroyed.
}

So let’s see what happens, when we run this program:

% c++ -o tut03_01 tut03_01.cpp

% ./tut03_01
CALLING Employee e0;
Employee() called -- this=0x7fffffffea00. [default ctor]
e0 is Employee(N/A,1000), this=0x7fffffffea00
---------------------------------------------------
CALLING Employee e1("e1", 1200.0);
Employee(e1,1200) called -- this=0x7fffffffe9f0. [normal ctor]
e1 is Employee(e1,1200), this=0x7fffffffe9f0
---------------------------------------------------
CALLING Employee e2(e1); // and changing name to e2
Employee(0x7fffffffe9f0) called -- this=0x7fffffffe9e0. [copy ctor]
e2 is now Employee(e2,1200), this=0x7fffffffe9e0
---------------------------------------------------
CALLING e2 = e0;
operator=(0x7fffffffea00) called -- this=0x7fffffffe9e0. [assign operator]
e2 is now Employee(N/A,1000), this=0x7fffffffe9e0
---------------------------------------------------
CALLING Employee *empPtr = new Employee("e_new", 8000.0);
Employee(e_new,8000) called -- this=0x800d04040. [normal ctor]
*empPtr is Employee(e_new,8000), this=0x800d04040
---------------------------------------------------
CALLING delete empPtr;
~Employee([0x800d04040]) called
~Employee([0x7fffffffe9e0]) called
~Employee([0x7fffffffe9f0]) called
~Employee([0x7fffffffea00]) called

Now pay close attention to the pointers of the objects.

So what do we see here?

  • The first Employee has been created by a default constructor, and stored in the variable e0.
  • Providing explicit parameters for the name and the salary, invokes a different constructor, i.e. the one with the signature Employee(const std::string name, const float salary). Again, an Employee object is created, and stored in the variable e1.
  • When we call Employee e2(e1), we are in fact requesting that a copy of the object stored in e1 be made, and that copy be stored into the variable e2. C++ calls the copy constructor Employee(const Employee &copy), which creates a new object and stores it into e2.
  • Finally, the expression e2 = e0 means that we effectively want the state of e0 to be copied into the object stored into e2. Important: Note that e2 must already exist! This calls the assignment operator operator=, which in turn performs the necessary state-copying.

Hadn’t we explicitly defined the copy constructor and assignment operator, the compiler would have defined them for us. In this case, the state of the source object would have been copied over to the target object in a bit-wise fashion by the compiler-generated code (which we don’t see in the source code files). In our particular case, this would be been fine. However, it is not always a good idea to rely on compiler-generated functions: what if the state contained pointers to memory allocated with new? We would have gotten shallow copy semantics, and a lot of head scratching as to who would be responsible for delete-ing that! That’s why we need explicit copy constructors and assignment operators.

There’s nothing interesting to say about the last test case with new and delete. It behaves as expected, right?

Storing Employee objects in STL containers

The most important lesson to learn is that STL containers store copies of objects. This is different from Python, whose containers always store smart pointers to objects.

To illustrate this STL behavior, the following test program:

// tut03_01a.cpp -- store an Employee object in a std::list

#include "tut03_01.h"
#include <list>
#include <cstdlib>

typedef std::list<Employee> list_t;

int
main (int argc, char *argv[])
{
  Employee e1("e1", 1000);
  list_t aList;

  aList.push_back(e1);

  return EXIT_SUCCESS;
}

would print:

% c++ -o tut03_01a tut03_01a.cpp

% ./tut03_01a
Employee(e1,1000) called -- this=0x7fffffffea30. [normal ctor]
Employee(0x7fffffffea30) called -- this=0x800d02070. [copy ctor]
~Employee([0x800d02070]) called
~Employee([0x7fffffffea30]) called

So, what’s going on here? Look at the addresses:

  • The first object created at the address 0x7fffffffea30 was stored in e1.
  • The STL implementation of std::list.push_back() effectively called Employee‘s copy constructor to generate a copy of e1. This new copy with the (different) address 0x800d02070 has been stored in the list.
  • Because aList was created after e1, it will be destroyed before e1 when the program is about to end. When the destructor of aList is called, it will walk through all objects stored in the list, calling their destructors in turn. That’s why the copied object 0x800d02070‘s destructor will be called next.
  • Even though we can’t see it, aList will be destroyed at this point, after it has destroyed all its elements.
  • Last but not least, e1 will be destroyed, i.e. the destructor for the original object 0x7fffffffea30 will get called.

As we can see, std::list stores copies of objects, and assumes responsibility for calling their destructors when necessary.

Let’s repeat this test, using a std::map instead of a std::list:

// tut03_01b.cpp -- store an Employee object in a std::map

#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>

typedef std::map<std::string, Employee> map_t;

int
main (int argc, char *argv[])
{
  Employee e1("e1", 1000);
  map_t aMap;

  aMap["e1"] = e1;

  return EXIT_SUCCESS;
}

What whould happen here? We already expect std::map to create a copy of e1:

% c++ -o tut03_01b tut03_01b.cpp

% ./tut03_01b
Employee(e1,1000) called -- this=0x7fffffffea20. [normal ctor]
Employee() called -- this=0x7fffffffe9a0. [default ctor]
Employee(0x7fffffffe9a0) called -- this=0x7fffffffe988. [copy ctor]
Employee(0x7fffffffe988) called -- this=0x800d03068. [copy ctor]
~Employee([0x7fffffffe988]) called
~Employee([0x7fffffffe9a0]) called
operator=(0x7fffffffea20) called -- this=0x800d03068. [assign operator]
~Employee([0x800d03068]) called
~Employee([0x7fffffffea20]) called

Woah, hold on! What’s going on here? That’s a lot more complicated that with std::list!

If we follow the addresses, this is what we gather from all this (in pseudo code):

// This is what really happens (Pseudo-Code):
Employee e1("e1", 1000);    // 0x7fffffffea20

Employee e2;                // 0x7fffffffe9a0
Employee e3(e2);            // 0x7fffffffe988
Employee e4(e3);            // 0x800d03068

~Employee(e3);
~Employee(e2);

e4 = e1;

~Employee(e4);
~Employee(e1);

So, basically, std::map creates a copy of our original object e1 and stores it (in our pseudo-code it is marked as e4) in the map.

However, this copy isn’t made directly. Instead, this particular implementation of std::map creates a few intermediary ephemeral objects (e2 and e3), which get destroyed almost immediately again. Why it does this can’t be guessed without looking at the source code of std::map.operator[].

What’s important for us here right now is not that ephemeral objects get created, but the relative order in which the copy (e4) and the original (e1) are being destroyed: just like in the previous std::list example, since aMap is being destroyed before e1, the destructor of aMap takes care of calling the destructors of all stored objects (copies), therefore destroying e4 first. Then, after aMap has been destroyed, e1 is being destroyed in turn.

By the way, hadn’t we implemented the default constructor Employee() in the class Employee, this particular implementation of std::map would have thrown a compile error at us, looking somewhat like this:

% c++ -o tut03_01b tut03_01b.cpp
/usr/include/c++/4.2/bits/stl_map.h: In member function
'_Tp& std::map<_Key, _Tp, _Compare, _Alloc>::operator[](const _Key&)
  [with _Key = std::basic_string<char, std::char_traits<char>,
                                 std::allocator<char> >,
        _Tp = Employee,
        _Compare = std::less<std::basic_string<char, std::char_traits<char>,
                             std::allocator<char> > >,
        _Alloc = std::allocator<std::pair<const std::basic_string<char,
                                                   std::char_traits<char>,
                                                   std::allocator<char> >,
                                          Employee> >]':
tut03_01b.cpp:16:   instantiated from here
/usr/include/c++/4.2/bits/stl_map.h:350: error:
   no matching function for call to 'Employee::Employee()'
tut03_01.h:24: note: candidates are: Employee::Employee(const Employee&)
tut03_01.h:14: note:                 Employee::Employee(std::string, float)

What does this cryptic message tell us? We obviously need a standard constructor Employee(), and neither our constructor Employee(const std::string name, const float salary) nor our copy constructor Employee(const Employee &copy) were able to fulfill the role of a standard constructor.

So how can we solve the problem? We can:

  • either add an Employee() standard constructor to the class Employee as we did alright,
  • or change Employee(const std::string, const float) to a constructor that accepts default parameters, as in Employee(const std::string="N/A", const float salary = 1000.0).

Storing pointers to Employees in STL containers

We’ve just seen that storing Employee objects in a std::map container incurs not only additional and gratuitous copies in the form of ephemeral and target objects, it also means per-value / copy semantics: changing an object in the STL container doesn’t affect the original object, but only the copy in the container.

If we wanted to mimic Python’s dictionary (and variable) semantics, we could store pointers to Employee objects right into the std::map, instead of storing the whole Employee objects.

A naive first try:

// tut03_01c.cpp -- store pointers to Employee objects in a std::map

#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>

typedef std::map<std::string, Employee *> map_t;

int
main (int argc, char *argv[])
{
  Employee e1("e1", 1000);
  map_t aMap;

  aMap["e1"] = &e1;

  return EXIT_SUCCESS;
}

seems to be running just fine:

% c++ -o tut03_01c tut03_01c.cpp

% ./tut03_01c
Employee(e1,1000) called -- this=0x7fffffffea20. [normal ctor]
~Employee([0x7fffffffea20]) called

Looks good: no more gratuitous copies nor ephemeral Employee objects. But beware, this isn’t as safe as we might imagine! How about the following code?

// tut03_01d.cpp -- store pointers to Employee objects in a std::map

#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>

typedef std::map<std::string, Employee *> map_t;

void
hire_at_minimum_wages(const std::string name, map_t &payroll)
{
  Employee aSlave(name, 400.0);
  payroll[name] = &aSlave;
}

int
main (int argc, char *argv[])
{
  map_t aMap;
  hire_at_minimum_wages("Hungry Programmer", aMap);
  hire_at_minimum_wages("Uncle Tom", aMap);

  for (map_t::const_iterator it = aMap.begin(); it != aMap.end(); ++it)
    std::cout << "Employee(" << it->first << "), "
              << it->second->getName() << ", $"
              << it->second->getSalary() << std::endl;

  return EXIT_SUCCESS;
}

Running it segfaults:

% c++ -o tut03_01d tut03_01d.cpp

% ./tut03_01d
Employee(Hungry Programmer,400) called -- this=0x7fffffffe940. [normal ctor]
~Employee([0x7fffffffe940]) called
Employee(Uncle Tom,400) called -- this=0x7fffffffe940. [normal ctor]
~Employee([0x7fffffffe940]) called
Bus error (core dumped)

If you’re a C or C++ programmer, you’d have already spotted the obvious error: aMap stores stale pointers to Employee objects that have already been destroyed (when the local variable aSlave in hire_at_minimum_wages() has gone out of scope)! The moment we tried to access this memory — even in a read-only manner –, we entered the realm of the scary undefined behavior.

Obviously, saving a pointer to an auto object like aSlave into a long-lived container isn’t such a bright idea!

An alternative is to modify hire_at_minimum_wages() in such a way, that it instantiates dynamic (heap) objects with new:

// tut03_01e.cpp -- store pointers to Employee objects in a std::map

#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>

typedef std::map<std::string, Employee *> map_t;

void
hire_at_minimum_wages(const std::string name, map_t &payroll)
{
  Employee *aSlavePtr = new Employee(name, 400.0);
  payroll[name] = aSlavePtr;
}

int
main (int argc, char *argv[])
{
  map_t aMap;
  hire_at_minimum_wages("Hungry Programmer", aMap);
  hire_at_minimum_wages("Uncle Tom", aMap);

  std::cout << "Our slaves:" << std::endl;
  for (map_t::const_iterator it = aMap.begin(); it != aMap.end(); ++it)
    std::cout << "  Employee(" << it->first << "), "
              << it->second->getName() << ", $"
              << it->second->getSalary() << std::endl;

  return EXIT_SUCCESS;
}

So let’s try it:

% c++ -o tut03_01e tut03_01e.cpp 

% ./tut03_01e
Employee(Hungry Programmer,400) called -- this=0x800d03040. [normal ctor]
Employee(Uncle Tom,400) called -- this=0x800d03050. [normal ctor]
Our slaves:
  Employee(Hungry Programmer), Hungry Programmer, $400
  Employee(Uncle Tom), Uncle Tom, $400

No more core dumps, but this time, we’ve got another problem. Can you guess which one? It’s right there to see, even if it is invisible! Hint: where have the calls to ~Employee() gone?

The problem here is that we’ve got a big fat memory leak: now that aMap doesn’t store Employee objects anymore, but pointers to Employee, aMap‘s destructor will not call ~Employee(). In fact, aMap‘s destructor will try to call the destructor of the stored pointers, but since the raw pointers used here don’t have destructors, nothing happens. Of course, delete never gets called on the new-ed objects… thus the memory leak.

Smart pointers to the rescue

The previous issue was caused by the fact that raw pointers don’t have destructors, and therefore, the destructor of the container wasn’t able to clean up dynamically allocated memory.

Instead of storing raw pointers in an STL container, we could have stored a custom object that mimics a raw pointer. This object would have to keep track of the number of references pointing to some data, and its destructor would deallocate the object (by calling delete object) when the last reference pointing to it is about to disappear. Do you recognize Python’s memory allocation scheme here?

Fortunatly, we don’t have to implement such a beast, because it already exists. Not in the C++ standard though, but in its TR1 addendum instead:

// tut03_01f.cpp -- store pointers to Employee objects in a std::map

#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>
#include <tr1/memory>

// If <tr1/memory> is not available for your compiler,
// #include <boost/shared_ptr.hpp>
// and use boost::shared_ptr instead of std::tr1::shared_ptr.
// Don't forget to add -I/path/to/boost/headers when compiling.
// Boost headers and library are available at http://www.boost.org/

typedef std::tr1::shared_ptr<Employee> emp_ptr_t;
typedef std::map<std::string, emp_ptr_t> map_t;

void
hire_at_minimum_wages(const std::string name, map_t &payroll)
{
  emp_ptr_t aSlavePtr(new Employee(name, 400.0));
  payroll[name] = aSlavePtr;
}

int
main (int argc, char *argv[])
{
  map_t aMap;
  hire_at_minimum_wages("Hungry Programmer", aMap);
  hire_at_minimum_wages("Uncle Tom", aMap);

  std::cout << "Our slaves:" << std::endl;
  for (map_t::const_iterator it = aMap.begin(); it != aMap.end(); ++it)
    std::cout << "  Employee(" << it->first << "), "
              << it->second->getName() << ", $"
              << it->second->getSalary() << std::endl;

  return EXIT_SUCCESS;
}

As shown in the comments, if your compiler doesn’t support TR1 headers yet, use Boost‘s boost::shared_ptr template instead, and add something like -I/usr/local/include when compiling.

So let’s try it out, one last time:

% c++ -o tut03_01f tut03_01f.cpp

% ./tut03_01f
Employee(Hungry Programmer,400) called -- this=0x800d03040. [normal ctor]
Employee(Uncle Tom,400) called -- this=0x800d03050. [normal ctor]
Our slaves:
  Employee(Hungry Programmer), Hungry Programmer, $400
  Employee(Uncle Tom), Uncle Tom, $400
~Employee([0x800d03050]) called
~Employee([0x800d03040]) called

Ain’t that sweet?

Summary

STL containers store copies of objects, and assume ownership of them; i.e. their destructor takes care of calling the destructors of all stored objects when necessary.

This copying behavior can incur an additional overhead w.r.t. CPU cycles and memory used to create ephemeral temporary objects. Moreover, those copy semantics are not the same as Python’s, Perl’s and other interpreted languages’ you may be used to.

Instead of storing objects into STL containers, one could also store just raw pointers to objects. But then, the container is no longer responsible for those objects. Extra care must be taken not to store pointers to auto (stack based) objects, and pointers to new allocated objects will leak memory.

When storing pointers to objects in STL containers, it is better to avoid raw pointers, and use a std::tr1::shared_ptr instead. If your compiler doesn’t implement the TR1 headers yet, you can use the corresponding boost::shared_ptr template from the Boost Libraries (best practices documented here).

In the next tutorial, we’ll have a look at stream iterators.