Welcome back! In part 4 we went over the nitty-gritty of lambdas and how to store them, we explored the relation between the various C++ algorithms and containers, plus we took a stroll through some non-standard containers with exceptional capabilities.

Note: part 1 is here.

In this probably final part 5, we’ll be going over some of the most powerful stuff in modern C++: “perfect” reference counting and the concept of std::move. Note that this installment introduces some pretty unfamiliar concepts, so it may be heavier reading than earlier parts.

Memory management

Memory is frequently the most important factor determining a program’s speed and reliability. CPUs these days tend to be tremendously faster than their attached RAM, so preventing needless copies and fragmented memory access may deliver whole orders of magnitude improvements in speed.

The authors of C++ were well aware of this and have delivered functionality that makes it possible to pass around or construct objects “in place”, thus saving a lot of memory bandwidth.

Most of these techniques require some thinking, but we’re going to start with one that is nearly invisible, and explains why some C code becomes faster simply by being recompiled as C++.

Copy elision or Return Value Optimization

Ponder the following code:

struct Object
{
	// many fields
};

Object getObject(size_t i)
{
	Object obj;
	// retrieve object #i
	return obj;
}

int main()
{
	Object o = getObject(271828);
}

Seasoned C programmers will have been exhorted to never do this, starting from gloried pages of K&R. “Passing structs over the stack leads to needless copying”. Instead, we’d be passing a pointer to an Object, meticulously reset it to its default state and then fill it out.

In C, the compiler is not allowed to optimize the code as written above, and the struct Object gets copied on the return from getObject. In C++ however, not only is the compiler allowed to optimize, all of them actually do it. In effect, Object o gets constructed on the caller’s stack and filled out there, with no copying going on.

This enables code like this to be efficient:

Vector<Object> getAll(); // "Vector" from part 3
..
auto all = getAll(); // returns millions of Objects

Later we will see how C++ offers explicit opportunities to transfer ownership without copying.

Smart Pointers

In part 2 we touched on smart pointers. We also noted that memory leaks are the bane of every project, and probably the most vexing thing about writing in pure C. Every C (and C++) programmer I know has lost at least one solid week on chasing an obscure memory leak.

These problems are so large that most modern programming languages decided to incur a significant amount of overhead to implement garbage collection (GC). GC is amazing when it works, and especially lately, the overhead is now at least manageable. But as of 2018, all environments still struggle with hick-ups caused by GC runs, which always tend to happen exactly when you don’t want them to. And to be fair, this is a very difficult problem to solve, especially when many threads are involved.

C++ has therefore not implemented garbage collection. Instead, there is a judicious selection of smart pointers that perform their own cleanup. In part 2 we described std::shared_ptr as “the most do what I mean” smart pointer available, and this is true.

Such magic does not come for free however. If we look ‘inside’ a std::shared_ptr, it turns out it carries a lot of administration. First there is of course the actual pointer to the object contained. Then there is the reference count, which needs to be updated and checked atomically at all times. Next up, there may also be a custom deleter. For good reasons, this metadata is itself allocated dynamically (on the heap). So while the sizeof of a std::shared_ptr may only be 16 bytes (on a 64 bit system), effectively it uses much more memory. In one specific test, a std::shared_ptr<uint32_t> ended up using 47 bytes of memory on average.

The question of course is: can we do better?

Introducing: std::unique_ptr

The overhead of generic reference counted pointer was well known when C++ took its initial standardized form. Back then, a quirky smart pointer called std::auto_ptr was defined, but it turned out that within the C++ of 1998 it was not possible to create something useful. Making “the perfect smart pointer” required features that only became available in C++ 2011.

First, let us try some simple things (source on GitHub):

  std::unique_ptr<uint32_t> testUnique;
  uint32_t* testRaw;
  std::shared_ptr<uint32_t> testShared;

  cout << "sizeof(testUnique):\t" << sizeof(testUnique) << endl;
  cout << "sizeof(testRaw):\t" << sizeof(testRaw) << endl;
  cout << "sizeof(testShared):\t" << sizeof(testShared) << endl;

Rather amazingly this outputs:

sizeof(testUnique): 8
sizeof(testRaw):    8
sizeof(testShared): 16

You saw that right. A std::unique_ptr has no overhead over a ‘raw’ pointer. And in fact, with some judicious casting, you can find out it in contains nothing other than the pointer you put in there. That’s zero overhead.

Here’s how to use it:

void function()
{
  auto uptr = std::make_unique<uint32_t>(42);

  cout << *uptr << endl;
} // uptr contents get freed here

The first line is a shorter (and better) way to achieve:

std::unique_ptr<uint32_t> uptr = std::unique_ptr<uint32_t>(new uint32_t(42));

In general, always prefer the std::make_* form for smart pointers. For std::shared_ptr it turns two allocations into one, which is a huge win both in CPU cycles and memory consumed.

It should be noted that std::unique_ptr may be a smart pointer, but it is not a generic reference counted pointer. Or, to put it more precisely, there is always exactly one place that owns a std::unique_ptr. This is the magic of why there is no overhead: there is no reference count to store, it is always ‘1’.

std::unique_ptr cleans up only when it goes out of scope, or when it is reset or replaced.

To access the contents of a smart pointer, either deference it (with * or ->), or use the get() method if you need the actual pointer inside. Smart pointers can also be unset, and in that case evaluate as ‘false’:

  std::unique_ptr<int> iptr;

  auto p = [](const auto& a) {
    cout << "pointer is " << (a ? "" : "not ") << "set\n";
  };

  p(iptr);
  cout << (void*) iptr.get() << endl;  

  iptr = std::make_unique<int>(12);
  p(iptr);

  iptr.reset();
  p(iptr);

This prints:

pointer is not set
0
pointer is set
pointer is not set

If we attempt to copy a std::unique_ptr, the compiler stops us. It does allow us however to ‘move’ it:

  std::unique_ptr<uint32_t> uptr2;

  uptr2 = uptr; // error about 'deleted constructor'

  uptr2 = std::move(uptr); // works

The reason a simple copy is not allowed is that this would lead to a non-unique pointer: both uptr and uptr2 would refer to the same uint32_t instance.

So what is this std::move thing, and why does that work?

std::move

In part 2 we defined a SmartFP class as an example of Resource Acquisition Is Initialization (RAII). It’s intended to be used like this:

int main()
try
{
	string line;
	SmartFP sfp("/etc/passwd", r");
	stringfgets(line, sfp.d_fp); // simple wrapper
	// do stuff
}
catch(std::exception& e) 
{
	cerr << "Fatal error: " << e.what() << endl;
}

SmartFP underneath is nothing but a wrapper for fopen and fclose. It also turns fopen errors into an explanatory exception. The nice thing about RAII that it guarantees the filedescriptor won’t ever leak, even in the face of error conditions.

In part 2 we also noted that as defined, SmartFP had a problem. It performs an fclose when it goes out of scope, but what if someone copied our SmartFP instance? We would then close the same FILE pointer twice, which is very bad. Enter the move constructor:

struct SmartFP
{
  SmartFP(const char* fname, const char* mode)
  {
    d_fp = fopen(fname, mode);
    if(!d_fp)
      throw std::runtime_error("Can't open file: " + stringerror());
  }

  SmartFP(SmartFP&& src) // move constructor. Note "&&"
  {
    d_fp = src.d_fp;
    src.d_fp = 0;
  }

  ~SmartFP()
  {
    if(d_fp)
      fclose(d_fp);
  }
  FILE* d_fp{0};
};

The move constructor is the important bit. Its presence tells C++ that this class can not be copied, only moved. The semantics of a move are a true transfer of ownership.

The following may help:

SmartFP sfp("/etc/passwd", "ro");
cout << (void*) sfp.d_fp << endl;  // prints a pointer

SmartFP sfp2 = sfp;                // error
SmartFP sfp2 = std::move(sfp);     // transfer!

cout << (void*) sfp.d_fp << endl;  // prints 0
cout << (void*) sfp2.d_fp << endl; // prints same pointer

When this code runs, the FILE pointer we created on the first line gets fclosed exactly once. This is because during the move, the ‘donor’ FILE* was set to zero, and in the destructor we make sure not to fclose a 0.

This move is performed automatically on return:

SmartFP getTmpFP()
{
	// get tmp name
	return SmartFP(tmp, "w");
}

...

SmartFP fp = getTmpFP();

Additionally, the C++ standard containers are all move aware, with a special syntax to construct elements ‘in place’:

  vector<SmartFP> vec;
  vec.emplace_back("move.cc", "r");

All the parameters to emplace_back get forwarded to the SmartFP constructor, which constructs the instance straight into the std::vector - all without a single copy. When filling large containers, this can make a huge difference.

Note that if we want to, a class can have both move constructors and regular constructors. A good example of this are all the C++ standard containers, including std::string. This gives you a choice between making a real copy or transferring ownership.

Smart pointers and polymorphism

A main reason we store things as pointers is to benefit from polymorphism. The downside of pointers is of course memory management, so it would be great if smart pointers were to interoperate with base and derived classes. The wonderful news is that they do.

Based on our Event class from part 3:

  std::deque<std::unique_ptr<Event>> eventQueue;

  eventQueue.push_back(std::make_unique<PortScanEvent>("1.2.3.4"));
  eventQueue.push_back(std::make_unique<ICMPEvent>());

  for(const auto& e : eventQueue) {
    cout << e->getDescription() << endl;
  }

This all works as expected, and the contents of eventQueue get cleaned up when the container goes out of scope.

When using polymorphic classes, make sure there either is no ~destructor, or that it is declared as virtual. Otherwise std::unique_ptr will call the base class destructor. See part 3 for more details, plus this stackoverflow post

Placement new

As noted earlier, C makes it possible to ’live on the edge’, or as some of the Node.JS people said, to ‘be close to the metal’. The good thing is that C++ offers you that same ability, should you need it, and more.

When we do the following:

auto ptr = new SmartFP("/etc/passwd", "ro");

This does two things:

Allocate memory to store a SmartFP instance
Call the SmartFP constructor using that memory

Generally this is what we need. However, sometimes our memory arrives from elsewhere but we’d still like to construct objects on there. Enter placement new.

Here is an actual usecase from the PowerDNS dumresp utility:

  std::atomic<uint64_t>* g_counter;

  auto ptr = mmap(NULL, sizeof(std::atomic<uint64_t>), PROT_READ | PROT_WRITE,
		  MAP_SHARED | MAP_ANONYMOUS, -1, 0);

  g_counter = new(ptr) std::atomic<uint64_t>();
  
  for(int i = 1; i < atoi(argv[3]); ++i) {
    if(!fork())
      break;
  }

This uses mmap to allocate memory that will be shared with any child processes, and then uses fancy placement new syntax to construct a std::atomic<uint64_t> instance in that shared memory.

The code then forks the number of processed described in argv[3]. Within all these processes, a simple ++(*g_counter) works, and all update the same counter.

Based on techniques like these, it is possible to create highly efficient and easy to use interprocess communications libraries, like for example Boost Interprocess.

Some general advice

Many modern C++ projects will only have a handful of explicit calls to new or delete (or malloc/free). It is easy to audit those few calls. Restrict manual memory allocation to the cases where you really have to.

For the rest, use std::unique_ptr if you can get away with it, and std::shared_ptr when you can’t. Note that you can convert a std::unique_ptr into a std::shared_ptr efficiently, so you can change your mind:

auto unique = std::make_unique<std::string>("test");
std::shared_ptr<std::string> shared = std::move(unique);

In addition, a std::unique_ptr can also release() the pointer it owns, which means it will not get deleted automatically.

The easy ability to cheaply convert a std::unique_ptr into a std::shared_ptr or a raw pointer means that functions can return a std::unique_ptr and keep everyone happy.

On the move constructor, it pays to understand this somewhat unfamiliar construct. Classes that represent resources (like sockets, file descriptors, database connections) are naturals for having a move constructor, since this makes their semantics closely match how these resources work: should be opened and closed exactly once, and exactly when we want them to.

Summarising

Memory allocation is hard and various smart pointers provided by C++ make it easier. std::shared_ptr is luxurious but comes with baggage, std::unique_ptr is frequently good enough and carries no overhead at all.

C++ tries hard to prevent needless copying of objects and adding a move constructor makes this explicit. By using std::move it is possible to store std::unique_ptr instances in containers, which is both safe and fast.

If you have any other favorite things you’d like to see discussed or questions, please do hit me up on @bert_hu_bert or bert@hubertnet.nl.