C++ iostreams: Unexpected but legal multithreaded behaviour

In previous articles, I’ve waxed rhapsodic about how great C++ is. I also noted there however that every language, C++ included, has its dark sides. Some languages have an unavoidable pervasive dark side, like being slow or hard to multithread, for C++ that dark side is mostly its complexity. In this post I want to zoom in on a specific ‘gotcha’ that recently took me several hours to resolve. I wrote this piece so anyone running into the same issue might find out about it if they search the web.

You may end up at this page if your C++ programs suffer from duplicate output in multi-threaded programs, or unexpectedly corrupted output. This specifically after you’ve enabled the optimization std::ios_base::sync_with_stdio(false).

It turns out that sync_with_stdio determines a lot more than if C stdio is synced with C++ iostreams.

I want to thank Stackoverflow user bames53 for the insights found in his comment here. Thanks are also due to Stefan Bühler who first pointed out my bug was likely entirely legal behaviour.

iostreams

When any new language is released, it needs a ‘Hello, world’ program. And in fact, whole languages may be graded on how easy or hard it is to output text to screen or a file.

C (or POSIX) offer two ways of doing i/o - either the lowest level form of straight up system calls like write(2), or the slightly more advanced stdio which offers buffering, formatting and parsing.

As was already noted in ‘The C++ Programming Language’, in theory, C++ should be able to do better than printf("Hello %s world", "new"). This printf needs to parse its formatting string at runtime and in fact every time it is invoked. And, without special compiler help, you won’t know if the formatting %s is correctly matched up to a pointer to a string.

Enter the C++ iostreams which were one of the first things everyone encountered in C++. In theory, cout << "Hello " << "new" <<" world" could be faster than the printf above, since it could figure out what it needed to do at compile time, and thence do it faster at runtime.

In practice this was not the case, and despite valiant efforts, the C++ iostreams remained slower and more cumbersome to use than the existing C stdio. printf use is still rife & not frowned upon.

Lately however, and I’m not sure when this happened, a lot of work has been spent at least in G++ to speed up iostreams. It is now entirely feasible to use iostreams for bulk text processing.

The nitty gritty

To make coexistence between C stdio and C++ iostreams possible, by default, writing things to cout will happen in such a way that it ends up in the same buffer as when writing to stdout using stdio. So this will do the right thing:

printf("Hello, ");
cout << "new world" << endl;

This synchronization of course comes at a performance penalty, or at least, we tend to assume so. Most C++ programmers will decide that if they can get away with it, disabling any form of synchronization is a good idea. Much advice online therefore suggests doing std::ios_base::sync_with_stdio(false), frequently not noting that this needs to happen before any i/o has occurred.

The name sync_with_stdio certainly suggests this is about interoperability with C stdio. It turns out this is not the case. Disabling this synchronization fundamentally alters how cin, cout, cerr, clog and variants function.

Multi-threading

One reason for using C++ is that it supports multi-threading (or more broadly, multi-processing) very well. The original C++ standard had no words on it because back in the day, officially there were no threads. Later versions of C++ (starting with C++ 2011) dusted off the iostreams specification and added words on thread safety.

This starts off with the following:

Concurrent access to a stream object (30.8, 30.9), stream buffer object (30.6), or C Library stream (30.12) by multiple threads may result in a data race (6.8.2) unless otherwise specified (30.4). [ Note: Data races result in undefined behavior (6.8.2). — end note ] – [iostreams.threadsafety]

This is a blanket statement that bad things may happen if we do stuff to iostreams from several threads at the same time, unless there is a specific statement that says doing so is safe.

Luckily, there is the following paragraph too:

Concurrent access to a synchronized (27.5.3.4) standard iostream object’s formatted and unformatted input (27.7.2.1) and output (27.7.3.1) functions or a standard C stream by multiple threads shall not result in a data race (1.10). [Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. — end note] – [iostream.objects.overview]

No disasters will happen on concurrent use of iostreams, although if you print out two log lines to cerr at the same time, you may find them interleaved in your output. This certainly is not pretty & hard to parse, but at least it is not illegal.

Note however that this paragraph talks only about ‘synchronized’ streams. Once we call the much recommended sync_with_stdio(false), our streams are no longer synchronized, not only not with stdio, but not at all. This means every write operation on cin or cout etc must now be protected by a mutex.

This itself is likely reason enough to never call sync_with_stdio(false) in any multi-threaded program using cout to print things.

Ha, but I never do output from two threads at once

We now end up at my mysterious bug, which can be reproduced with the following tiny program:

#include <iostream>
#include <thread>
#include <string>
#include <unistd.h>

using namespace std;

void theThread()
{
  for(int counter = 0 ;; ++counter) {
    usleep(250000);
    cout << "Hi "<< counter << endl;
  }
}

int main()
{
  std::ios_base::sync_with_stdio(false);
  
  string line;
  thread t(theThread);
  while(getline(cin, line))
    ;
}

If this is invoked as yes | ./repro, we may get the following output:

HHi 0
Hi Hi 1
Hi Hi 2

Where we would be expecting to see Hi 1, Hi 2, Hi 3 etc. We only ever operate on cout from theThread(), and never from main. It feels like this should be safe, but it still fails. What is going on?

tied iostreams

In its wisdom, the C++ standards committee decided that some iostreams should be tied together. This guarantees that the following works:

std::ios_base::sync_with_stdio(false);

cout << "Enter your name\n";
cin >> name;

The initial “Enter your name” string is buffered and not emitted to the terminal. Without tying, the user will be asked for input before the program has printed what it wants.

Because of the tie, any read operation on cin will trigger a flush on cout. Most helpful.

However, this flush is a write operation! So in our sample program above, we do in fact have two threads operating on cout at the same time. Every time we read a line from yes, cout gets flushed. It therefore it is entirely legal (if unexpected) for our compiler & standard library to emit odd output.

The solution to this problem is simple, insert the following:

cin.tie(nullptr);

This breaks the tie between cin and any other iostream.

Note: in practice, printing a \n to the terminal usually flushes the stream, because of the synchronization with stdio. This why in the example above, we first had to disable that synchronization to exhibit the problem of ‘output only appearing after being asked for input’.

Is that the whole solution?

Reading up on various bugs filed on iostreams operating in unsynchronized mode, it appears that wise users will stick to synchronized streams for their multithreaded programs. It is far too easy to stumble when you forego the promise from [iostream.objects.overview].

Note however that even synchronized streams may deliver interleaved output, either on a character by character basis or by emitting whole chunks of lines mixed together. There is no guarantee at all that:

cout << "Hello user #" << userno << ", welcome to " << host << endl

.. will actually be emitted without interleaving with other concurrent invocations of this line. You may easily see “Hello user #Hello user #123987,,” as valid output.

Using clog will also not help - there is no way for it to emit the various components of a log line atomically. You will need a lock.

Summarising

Be very careful when using std::ios_base::sync_with_stdio(false), and if you do, also issue cin.tie(nullptr). Make sure sync_with_stdio is called before doing any i/o.

In general, be very weary of doing output operations on a single iostream from multiple threads - it may not do what you want.

Some further reading:

  • The libstdc++ bug I filed about this, where it will likely be concluded this is (unfortunately) not a bug, but intended behaviour
  • The {fmt} library is a simpler alternative to rapidly output text. Typically faster than printf.