Chapter 3. Theory

We'll be working with three different variable types:


Generally 32 bits, signed or unsigned. Also called 'Single Integer' in some circles. While these numbers are always exact, they have no dot in them, you can often use them for floating point calculations. A common way is to multiply all numbers by 100, gaining you 2 digits of 'floating point'.


Generally 32 bits, signed or unsigned. Also called 'Single Precision Floating Point'. There has been some misguided advice, with one well known author stating 'if you think you need floats you did not understand the problem'. Single precision calculations are generally well suited for many applications and operate at much higher speeds than double precision ones.

Floats can also hold several 'magic' values like 'infinity', '-infinity' and 'NaN'

Floating point theory is actually very complicated but luckily for us, they mostly do the right thing. Be aware though that weird things can happen with numbers very close to zero or to 'infinity', which in case of a float is generally somewhere around 1038.


Also known as Double Precision Floating Point. Generally 48 bits, offering greater precision and range. Infinity comes somewhere beyond 10308, the smallest possible number distinguishable from zero is somewhere near 10-308. Overkill for many calculations and slower to boot.

In other words, if floats aren't precise enough for your needs, you may want to rethink your problem before just moving to doubles.

There is a host of CPUs out there, which do not all support the same features. Especially where it comes to SIMD support, there are differences. These are the six types of vector instruction sets that are relevant:

MMX aka 'Multimedia Extensions'

Released in 1997 and present in all common ia32 (aka 'Intel') compatible CPUs. Operates on integers only, and has no support for division, only multiplication.

SSE aka 'Streaming SIMD Extensions'

Appeared when streaming video was supposed to become hot, at a time when nobody thought this was even a remote possibility. A lot can be said about Intel, but quite often they are spot on in predicting the future. SSE operates on 4 single precision floats at a time. SSE is present on Pentium 3 and higher, and also on the Athlon processor.

In the Intel compatible world, SSE is the greatest common denominator.


Only supported by Pentium 4, Athlon 64 and Opteron. Contains an updated MMX which can process more integers at a time, as well as support for double precision math. This latter however does only two doubles at a time, or alternatively, four floats.


Only supported by the very new Pentium 4 'Prescott' and offers some additional opcodes which help calculate the dot products of vectors, or do bulk addition of data, so called 'Horizontal' mode, where all parts of a single SSE2 register are being processed, instead of all aligned parts of two registers.

It is mentioned that these vertical instructions can be used to speed up complex arithmetic and Fourier transforms.

For more documentation, the author would welcome a Pentium 4 Prescott enabled system!


Sort of SSE, but not quite. Has some horizontal instructions too. Supported on Athlon.


Supported on PowerPC G4 and higher. Haven't investigated this yet. There are indications that the gcc on MacOS X will not properly compile or execute the examples in this document.

Ok, time to do some calculations!