Thursday, January 10, 2008

Decimal floating point in .NET

In my article on binary floating point types, I mentioned the System.Decimal (or just decimal in C#) type briefly. This article gives more details about the type, including its representation and some differences between it and the more common binary floating point types. From here on, I shall just refer to it as the decimal type rather than System.Decimal, and likewise where float and double are mentioned, I mean the .NET types System.Single and System.Double respectively. To make the article easier on the eyes, I'll leave the names in normal type from here on, too.

What is the decimal type?
The decimal type is just another form of floating point number - but unlike float and double, the base used is 10. If you haven't read the article linked above, now would be a good time to read it - I won't go into the basics of floating point numbers in this article.

The decimal type has the same components as any other floating point number: a mantissa, an exponent and a sign. As usual, the sign is just a single bit, but there are 96 bits of mantissa and 5 bits of exponent. However, not all exponent combinations are valid. Only values 0-28 work, and they are effectively all negative: the numeric value is sign * mantissa / 10exponent. This means the maximum and minimum values of the type are +/- (296-1), and the smallest non-zero number in terms of absolute magnitude is 10-28.

The reason for the exponent being limited is that the mantissa is able to store 28 or 29 decimal digits (depending on its exact value). Effectively, it's as if you have 28 digits which you can set to any value you want, and you can put the decimal point anywhere from the left of the first digit to the right of the last digit. (There are some numbers where you can have a 29th digit to the left of the rest, but you can't have all combinations with 29 digits, hence the restriction.)

How is a decimal stored?
A decimal is stored in 128 bits, even though only 102 are strictly necessary. It is convenient to consider the decimal as three 32-bit integers representing the mantissa, and then one integer representing the sign and exponent. The top bit of the last integer is the sign bit (in the normal way, with the bit being set (1) for negative numbers) and bits 16-23 (the low bits of the high 16-bit word) contain the exponent. The other bits must all be clear (0). This representation is the one given by decimal.GetBits(decimal) which returns an array of 4 ints.

Formatting decimals
Unlike floats and doubles, when .NET is asked to format a decimal into a string representation, its default behaviour is to give the exact value. This means there is no need for a decimal equivalent of the DoubleConverter code of the binary floating point article. You can, of course, ask it to restrict the value to a specific precision.

Keeping zeroes
Between .NET 1.0 and 1.1, the decimal type underwent a subtle change. Consider the following simple program:

using System;

public class Test
{
static void Main()
{
decimal d = 1.00m;
Console.WriteLine (d);
}
}



When I first ran the above (or something similar) I expected it to output just 1 (which is what it would have been on .NET 1.0) - but in fact, the output was 1.00. The decimal type doesn't normalize itself - it remembers how many decimal digits it has (by maintaining the exponent where possible) and on formatting, zero may be counted as a significant decimal digit. I don't know the exact nature of what exponent is chosen (where there is a choice) when two different decimals are multiplied, divided, added etc, but you may find it interesting to play around with programs such as the following:

using System;

public class Test
{
static void Main()
{
decimal d = 0.00000000000010000m;
while (d != 0m)
{
Console.WriteLine (d);
d = d/5m;
}
}
}



Which produces a result of:

0.00000000000010000
0.00000000000002000
0.00000000000000400
0.00000000000000080
0.00000000000000016
0.000000000000000032
0.0000000000000000064
0.00000000000000000128
0.000000000000000000256
0.0000000000000000000512
0.00000000000000000001024
0.000000000000000000002048
0.0000000000000000000004096
0.00000000000000000000008192
0.000000000000000000000016384
0.0000000000000000000000032768
0.0000000000000000000000006554
0.0000000000000000000000001311
0.0000000000000000000000000262
0.0000000000000000000000000052
0.000000000000000000000000001
0.0000000000000000000000000002



Everything's a number
The decimal type has no concept of infinity or NaN (not-a-number) values, and despite the above examples of the same actual number being potentially representable in different forms (eg 1, 1.0, 1.00) the normal == operator copes with these and reports 1.0==1.00 etc.

Accuracy
The decimal type has a larger precision than any of the built-in binary floating point types in .NET, although it has a smaller range of potential exponents. Also, many operations which yield surprising results in binary floating point due to inexact representations of the original operands go away in decimal floating point, precisely because many operands are specifically represented in source code as decimals. However, that doesn't mean that all operations suddenly become accurate: a third still isn't exactly representable, for instance. The potential problems are just the same as they are with binary floating point. However, most of the time the decimal type is chosen for quantities like money, where operations will be simple and keep things accurate. (For instance, adding a tax which is specified as a percentage will keep the numbers accurate, assuming they're in a sensible range to start with.) Just be aware of which operations are likely to cause inaccuracy, and which aren't. As a very broad rule of thumb, if you end up seeing a very long string representation (ie most of the 28/29 digits are non-zero) then chances are you've got some inaccuracy along the way: most of the uses of the decimal type won't end up using very many significant figures when the numbers are exact. If you find yourself using inaccurate numbers, you should make sure that you expected it, and consider why you're using the decimal type in the first place. (In some situations it'll make sense despite the performance hit; in many it won't.)

No comments: