I have implemented multiple payment processors in the past, and a rule I always follow is to use integers for all the currency calculations. The reason is simple. Floating-point arithmetic is not accurate.

And if you read the IEEE 754, it can be surprissing how innaccurate floating-point arithmetic is.

Floating point allows us to approximate all the values between a range due to its special encoding, but it cannot possibly represent all the numbers in that range and not even all the integers in that range.

Even though floating-point has its drawbacks, we may still want to use it to solve many computing problems where floating-point works and we accept its compromises.

However, this innaccuracies are a problem and as software developers we must understand how this encoding works and still produce correct results in our applications.

How real values are represented?

When we want to store real numbers in the computer we use float and double data types which are 32 and 64 bits respectively. This is a finite amount of bits that we can use to store a real number, but the possible real numbers are infinite.

We want to be able to represent big numbers but also the numbers in between those numbers. So we need to find a way to solve this issue with our finite amount of bits.

There have been many approaches to solve this issue of encoding real numbers in the computer, but the current winner is floating point which allows us to express a dynamic range without wasting bits because the radix point is floating instead of fixed.

This is achieved by encoding the data that represents the number.

The majority of floating-point formats follow the IEEE 754 standard to store the numbers.

The memory layout is as follows:

[------------------------------------ 32 bits ------------------------------------]

[1 bit sign][---- 8 bits exponent ----][------------ 23 bits mantissa -----------]

An example

To see how floating point numbers are actually stored in memory let's work through an example by converting the number 123.45 into it's IEEE 754 representation.

First we need to convert 123 into binary. To convert a number into binary we can divide it by 2 and keep the remainder until we reach 0.

123 / 2 = 61 reminder = 1 61 / 2 = 30 reminder = 1 30 / 2 = 15 reminder = 0 15 / 2 = 7 reminder = 1 7 / 2 = 3 reminder = 1 3 / 2 = 1 reminder = 1 1 / 2 = 0 reminder = 1

now we take the reminders from bottom to top and that's 123 in binary: 1111011

The next step is to convert 0.45 into binary. This process requires us to multiply by 2 and keep the integer part of the resulting number.

0.45 x 2 = 0.9 keep 0 0.9 x 2 = 1.8 keep 1 but also subtract it from the next operation 0.8 x 2 = 1.6 keep 1 0.6 x 2 = 1.2 keep 1 0.2 x 2 = 0.4 keep 0 0.4 x 2 = 0.8 keep 0 You will notice that we already calculated 0.8 so the sequence will repeat infinitely 0.8 x 2 = 1.6 keep 1 0.6 x 2 = 1.2 keep 1 0.2 x 2 = 0.4 keep 0 0.4 x 2 = 0.8 keep 0 0.8 x 2 = 1.6 keep 1 0.6 x 2 = 1.2 keep 1 0.2 x 2 = 0.4 keep 0 0.4 x 2 = 0.8 keep 0 ...

The binary representation of 0.45 is: 01110011001100110011001100...

so we can say 123.45 is 1111011.01110011001100110011001100...
Now we need to transform this number into scientific notation. For that we need to shift the radix point to the left until we reach: 1.11101101110011001100110011001100... and since we shifted the point 6 positions to the left we multiply by 2 to the power of 6

Our scientific notation is: 1.11101101110011001100110011001100... x 2^6
Now that we have our number in scientific notation. The final step is to encode it in the IEEE 754 standard.

1 sign bit

8 bits for the exponent

23 bits for the mantissa

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm

The sign number is either 0 or 1. 0 for positive numbers and 1 for negative numbers so in our case we store a 0.

0 eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm

For the exponent we need to be able to store positive and negative numbers. In our case the exponent was a positive number since we shifted to the radix point to the left but we could also have shifted to the right in cases where the number has many 0s at the beginning, so we don't waste any bits encoding unnecesary 0s. Remember that the radix point can float.

So to encode the exponent we use a bias. For single precision (32 bits) the bias is 127. So we get 127 + 6 = 133 and we convert 133 into binary following the same procedure as before. We get: 10000101

0 10000101 mmmmmmmmmmmmmmmmmmmmmmm

The mantissa is the fractional bits of our scientific notation number and we can store 23 bits.
Therefore our final number is:

0 10000101 11101101110011001100110 nicely grouped in nibbles as: 0100 0010 1111 0110 1110 0110 0110 0110

As you can see, if we had more that 32 bits we could store more digits in the mantissa which means our recently calculated number is not really accurate and we didn't actually store 123.45, but 123.449997.

Our number is imprecise just by storing it into floating point format. This is why it's never used in financial software or any type of applications that require precision, and as you can imagine making additions and subtractions can accumulate error specially because we are dealing with scientific notation and we CANNOT add 2 numbers together if they have different exponents, we need to make the exponents match and then we can add the values which as you can imagine will make the already imprecise numbers even more inaccurate.

Fun exercise

If you want to verify all the calculations we just did, we can write a little program that shows us how a floating point number is stored by our machine.

#include <stdio.h> int main(int argc, char *argv[]) { float v = 123.45f; long sign = ((*(long *)&v) >> 31) & 1; long exponent = ((*(long *)&v) >> 23) & 0xFF; long mantissa = (*(long *)&v) & 0x7FFFFF; printf(\"number: %f\\n\", v); printf(\"hex: %#012x\\n\", (*(long *)&v)); printf(\"sign: %i\\n\", sign); printf(\"exponent: %#010x\\n\", exponent); printf(\"mantissa: %#010x\\n\", mantissa); return(0); }

We compile it: cl -nologo -Od -WX -W4 -wd4100 -EHsc -FC -Zi -Fefloat.exe main.cpp

And when we run it we get the following output:

number: 123.449997 hex: 0x0042f6e666 sign: 0 exponent: 0x00000085 mantissa: 0x0076e666

I printed them in hexadecimal format to reduce the output size but, you can convert those numbers into binary and you will see the same values we just calculated.

Conclusion

Hopefully, this was a clear and useful explanation. If you require accuracy in your application, consider other options. In financial applications, you could use integers and store your values in cents. You could also use a fixed point library or find another clever way to account for the errors floating point can cause to your calculations.

Until the next time!