Modern C++ for C programmers: part 2

Welcome back! In part 1 I discussed how std::string and std::vector interoperate with C, including with the C standard library qsort call. We also discovered that the C++ std::sort is 40% faster than C qsort because C++ is able to inline the comparison function.

In this part we continue with further C++ features that you can use to spice up your code ’line by line’, without immediately having to use all 1400 pages of ‘The C++ Programming Language’.

Various code samples discussed here can be found on GitHub.

If you have any favorite things you’d like to see discussed or questions, please hit me up on @bert_hu_bert or bert@hubertnet.nl

Namespaces

Namespaces allow things with identical names to live side by side. This is of immediate relevance to us since C++ defines a lot of functions and classes that might collide with names you are already using in C. Because of this, the C++ libraries live in the std:: namespace, making it far easier to compile your C code as C++.

To save a lot of typing, it is possible to import the entire std:: namespace with using namespace std, or to select individual names: using std::thread.

C++ does have some keywords itself like this, class, throw, catch and reinterpret_cast that could collide with existing C code.

Classes

An older name of C++ was ‘C with classes’, and consisted of a translator that converted this new C++ into plain C. Interestingly enough this translator itself was written in ‘C with classes’.

Most advanced C projects already use classes almost exactly like C++. In its simplest form, a class is nothing more than a struct with some calling conventions. (Inheritance & virtual functions complicate the picture, and these optional techniques will be discussed in part 3).

Typical modern C code will define a struct that describes something and then have a bunch of functions that accept a pointer to that struct as the first parameter:

struct Circle
{
	int x, y;
	int size;
	Canvas* canvas;
	...
};

void setCanvas(Circle* circle, Canvas* canvas);
void positionCircle(Circle* circle, int x, int y);
void paintCircle(Circle* circle);

Many C projects will in fact even make (part of) these structs opaque, indicating that there are internals that API users should not see. This is done by forward declaring a struct in the .h, but never defining it. The sqlite3 handle is a great example of this technique.

A C++ class is laid out just like the struct above, and in fact, if it contains methods (member functions), these internally get called in exactly the same way:

class Circle
{
public:
	Circle(Canvas* canvas);  // "constructor"
	void position(int x, int y);
	void paint();
private:
	int d_x, d_y;
	int d_size;
	Canvas* d_canvas;
};

void Circle::paint()
{
	d_canvas->drawCircle(d_x, d_y, d_size);
}

If we look “under water” Circle::position(1, 2) is actually called as Circle::position(Circle* this, int x, int y). There is no more magic (or overhead) to it than that. In addition, the Circle::paint and Circle::position functions have d_x, d_y, d_size and d_canvas in scope.

The one difference is that these ‘private member variables’ are not accessible from the outside. This may be useful for example when any change in x needs to be coordinated with the Canvas, and we don’t want users to change x without us knowing it. As noted, many C projects achieve the same opaqueness with tricks - this is just an easier way of doing it.

Up to this point, a class was nothing but syntactic sugar and some scoping rules. However..

Resource Acquisition Is Initialization (RAII)

Most modern languages perform garbage collection because it is apparently too hard to keep track of memory. This leads to periodic GC runs which have the potential to ‘stop the world’. Even though the state of the art is improving, GC remains a fraught subject especially in a many-core world.

Although C and C++ do not do garbage collection, it remains true that it is exceptionally hard to keep track of each and every memory allocation under all (error) conditions. C++ has sophisticated ways to help you and these are built on the primitives called Constructors and Destructors.

SmartFP is an example that we’ll beef up in following sections so it becomes actually useful and safe:

struct SmartFP
{
	SmartFP(const char* fname, const char* mode)
	{
		d_fp = fopen(fname, mode);
	}
	~SmartFP()
	{
		if(d_fp)
			fclose(d_fp);
	}
	FILE* d_fp;
};

Note: a struct is the same as a class, except everything is ‘public’.

Typical use of SmartFP:

void func()
{
	SmartFP fp("/etc/passwd", "r");
	if(!fp.d_fp)
		// do error things

	char line[512];
	while(fgets(line, sizeof(line), fp.d_fp)) {
		// do things with line
	}	
	// note, no fclose
}

As written like this, the actual call to fopen() happens when the SmartFP object is instantiated. This calls the constructor, which has the same name as the struct itself: SmartFP.

We can then use the FILE* that is stored within the class as usual. Finally, when fp goes out of scope, its destructor SmartFP::~SmartFP() gets called, which will fclose() for us if d_fp was opened successfully in the constructor.

Written like this, the code has two huge advantages: 1) the FILE pointer will never leak 2) we know exactly when it will be closed. Languages with garbage collection also guarantee ‘1’, but struggle or require real work to deliver ‘2’.

This technique to use classes or structs with constructors and destructors to own resources is called Resource Acquisition Is Initialization or RAII, and it is used widely. It is quite common for even larger C++ projects to not contain a single call to new or delete (or malloc/free) outside of a constructor/destructor pair. Or at all, in fact.

Smart pointers

Memory leaks are the bane of every project. Even with garbage collection it is possible to keep gigabytes of memory in use for a single window displaying chat messages.

C++ offers a number of so called smart pointers that can help, each with its own (dis)advantages. The most “do what I want” smart pointer is std::shared_ptr and in its most basic form it can be used like this:

void func(Canvas* canvas)
{
	std::shared_ptr<Circle> ptr(new Circle(canvas));
	// or better:
	auto ptr = std::make_shared<Circle>(canvas)
}

The first form shows the C++ way of doing malloc, in this case allocating memory for a Circle instance, and constructing it with the canvas parameter. As noted, most modern C++ projects rarely use “naked new” statements but mostly wrap them in infrastructure that takes care of (de)allocation.

The second way is not only less typing but is more efficient as well.

std::shared_ptr however has more tricks up its sleeve:

// make a vector of shared pointers to Circle instances
std::vector<std::shared_ptr<Circle> > circles;

void func(Canvas* canvas)
{
	auto ptr = std::make_shared<Circle>(canvas)
	circles.push_back(ptr);
	ptr->draw();
}

This first defines a vector of std::shared_ptrs to Circle, then creates such a shared_ptr and stores it in the circles vector. When func returns, ptr goes out of scope, but since a copy of it is in the vector circles, the Circle object stays alive. std::shared_ptr is therefore a reference counting smart pointer.

std::shared_ptr has another neat feature which goes like this:

void func()
{
        FILE *fp = fopen("/etc/passwd", "r");
        if(!fp)
          ; // do error things

        std::shared_ptr<FILE> ptr(fp, fclose);

        char buf[1024];
        fread(buf, sizeof(buf), 1, ptr.get());
}

Here we create a shared_ptr with a custom deleter called fclose. This means that ptr knows how to clean up after itself if needed, and with one line we’ve created a reference counted FILE handle.

And with this, we can now see why our earlier defined SmartFP is not very safe to use. It is possible to make a copy of it, and once that copy goes out of scope, it will ALSO close the same FILE*. std::shared_ptr saves us from having to think about thse things.

The downside of std::shared_ptr is that it uses memory for the actual reference count, which also has to be made safe for multi-threaded operations. It also has to store an optional custom deleter.

C++ offers other smart pointers, the most relevant of which is std::unique_ptr. Frequently we do not actually need actual reference counting but only ‘clean up if we go out of scope’. This is what std::unique_ptr offers, with literally zero overhead. There are also facilities for ‘moving’ a std::unique_ptr into storage so it stays in scope. We will get back to this later.

Threads, atomics

Every time I used to create a thread with pthread_create in C or older C++, I’d feel bad. Having to cram all the data to launch the thread through a void pointer felt silly and dangerous.

C++ offers a powerful layer on top of the native threading system to make this all easier and safer. In addition, it has ways of easily getting data back from a thread.

A small sample:

double factorial(unsigned int limit)
{
        double ret = 1;
        for(unsigned int n = 1 ; n <= limit ; ++n)
                ret *= n;
        return ret;
}


int main()
{
      auto future1 = std::async(factorial, 19);
      auto future2 = std::async(factorial, 12);      
      double result = future1.get() + future2.get();
      
      std::cout<<"Result is: " << result << std::endl;
}

If no return code is required, launching a thread is as easy as:

	std::thread t(factorial, 19);
	t.join(); // or t.detach()

Like C11, C++ offers atomic operations. These are as simple as defining std::atomic<uint64_t> packetcounter. Operations on packetcounter are then atomic, with a wide suite of ways of interrogating or updating packetcounter if specific modes are required to for example build lock free data structures.

Note that as in C, declaring a counter to be used from multiple threads as volatile does nothing useful. Full atomics are required, or explicit locking.

Locking

Much like keeping track of memory allocations, making sure to release locks on all codepaths is hard. As usual, RAII comes to the rescue:

std::mutex g_pages_mutex;
std::map<std::string, std::string> g_pages;

void func()
{
	std::lock_guard<std::mutex> guard(g_pages_mutex);
	g_pages[url] = result;
}

The guard object above will keep g_pages_mutex locked for a long as needed, but will always release it when func() is done, through an error or not.

Error handling

To be honest, error handling is a poorly solved problem in any language. We can riddle our code with checks, and at each check I wonder “what should the program actually DO if this fails”. Options are rarely good - ignore, prompt user, restart program, or log a message in hopes that someone reads it.

C++ offers exceptions which in any case have some benefits over checking every return code. The good thing about an exception is that, unlike a return code, it is not ignored by default. First let us update SmartFP so it throws exceptions:

std::string stringerror()
{
	return strerror(errno);
}

struct SmartFP
{
        SmartFP(const char* fname, const char* mode)
        {
                d_fp = fopen(fname, mode);
                if(!d_fp)
                    throw std::runtime_error("Can't open file: " + stringerror());
        }
        ~SmartFP()
        {
                fclose(d_fp);
        }
        FILE* d_fp;
};

If we now create a SmartFP and it does not throw an exception, we know it is good to use. And for error reporting, we can catch the exception:

void func2()
{
        SmartFP fp("nosuchfile", "r");

        char line[512];
        while(fgets(line, sizeof(line), fp.d_fp)) {
                // do things with line
        }       
        // note, no fclose
}

void func()
{
    func2();
}

int main()
try {
    func();
} 
catch(std::exception& e) {
    std::cerr<< "Fatal error: " << e.what() << std::endl;
}

This shows an exception being thrown from SmartFP::SmartFP which then falls ’through’ both func2() and func() to get caught in main(). The good thing about the fallthrough is that an error will always be noticed, unlike a simple return code which could be ignored. The downside however is that the exception may get ‘caught’ very far away from where it was thrown, which can lead to surprises. This does usually lead to good error logging though.

Combined with RAII, exceptions are a very powerful technique to safely acquire resources and also deal with errors.

Code that can throw exceptions is slightly slower than code that can’t but it barely shows up in profiles. Actually throwing an exception is rather heavy though, so only use it for error conditions.

Most debuggers can break on the throwing of an exception, which is a powerful debugging technique. In gdb this is done with catch throw.

As noted, no error handling technique is perfect. One thing that seems promising is the std::expected work or boost::expected which creates functions that have both return codes or throw exceptions if you don’t look at them.

Summarising

In part 2 of ‘C++ for C programmers’, we showed how classes are a concept that is actually well used in C already, except that C++ makes it easier. In addition, C++ classes (and structs) can have constructors and destructors and these are extremely useful to make sure resources are acquired and released when needed.

Based on these primitives, C++ offers smart pointers of varying intelligence and overhead that cover most requirements.

Furthermore, C++ offers good support for threads, atomics and locking. Finally, exceptions are a powerful way of (always) dealing with errors.

If you have any favorite things you’d like to see discussed or questions, please hit me up on @bert_hu_bert or bert@hubertnet.nl

Part 3 is now available.