Thursday, March 29, 2012

How precise is a float?

in QA I have
select
sum(F_Basic_premium) TotalBasicPremium
from fat_bse_po_risk_detail(nolock)
where f_latest_sum_insured = 'Y'
-- 1. 22421075998346.898
-- 1. 22421075998346.594
select
sum(F_Basic_premium) TotalBasicPremium
from fat_bse_po_risk_detail(nolock)
where f_latest_sum_insured = 'Y'
-- 2. 22421075998347.285
-- 2. 22421075998346.680
Weird?Have you read the first paragraph under the topic "float and real" in SQL
Server Books Online?
Anith|||Float is not precise, it is approximate. If you want precise numbers, which
I assume you want with things like premiums, use the decimal datatype, in
this case decimal(17,3).
Jacco Schalkwijk
SQL Server MVP
"marcmc" <marcmc@.discussions.microsoft.com> wrote in message
news:25AC21E1-119C-4725-99F0-78B98A2FF19D@.microsoft.com...
> in QA I have
> select
> sum(F_Basic_premium) TotalBasicPremium
> from fat_bse_po_risk_detail(nolock)
> where f_latest_sum_insured = 'Y'
> -- 1. 22421075998346.898
> -- 1. 22421075998346.594
>
> select
> sum(F_Basic_premium) TotalBasicPremium
> from fat_bse_po_risk_detail(nolock)
> where f_latest_sum_insured = 'Y'
> -- 2. 22421075998347.285
> -- 2. 22421075998346.680
>
> Weird?
>|||to answer your question directly, much more accurately than you need. The
degree of accuracy depends on what number you are trying to store... If you
are trying to store one of the numbers that a float can represent, then it's
100% Dead-on balls accurate... Floats are no different than any other
datatype in that they can represent any number in a set of numbers, what
matters is the distance between the individual numbers that they can
represent exactly... It's the same as with Integers, you can represent 1, or
2 exactly, but if you try to use an integer for 0.5, then the integer is
going to be at least .5 off... Or if you use a decimal(5,2) (2 decimal
places), to represent 0,12 or 0.13 exactly, but if you try to represent
0.125...
So with floats, the only difference is that a float stores binary numbers,
not decimals, and stores them in scientific notation format, not directly, a
s
an integer or decimal datatype does... With a float, which is 64 bits wide,
53 bits are used for the mantissa, and the remaining 11 are used for the
exponent. So the mantissa can be any binary number between minus 2 to the 5
2
power, and Plus 2 to the 52 power -1
So the "accuracy", so to speak of a float, in storing the mantissa portion
could be said to be within .000000000000000111022302462515654042% (thats 1
divided by 2 to the 53nd power)
The exponent is 11 bits and is an integral type, so is is 100% dead-on balls
accurate for any integer exponent... This gives floats their Min and max
values, which are:
Negative numbers: - 1.79E + 308 through -2.23E - 308,
and for
positive numbers: 2.23E -308 through 1.79E + 308.
and of course zero (0)
So, for any real number you pick, (within the range ) there is a float
within .000000000000000111022302462515654042% of that value.
"marcmc" wrote:

> in QA I have
> select
> sum(F_Basic_premium) TotalBasicPremium
> from fat_bse_po_risk_detail(nolock)
> where f_latest_sum_insured = 'Y'
> -- 1. 22421075998346.898
> -- 1. 22421075998346.594
>
> select
> sum(F_Basic_premium) TotalBasicPremium
> from fat_bse_po_risk_detail(nolock)
> where f_latest_sum_insured = 'Y'
> -- 2. 22421075998347.285
> -- 2. 22421075998346.680
>
> Weird?
>|||Marc,
A float is very precise - about 53 binary digits of accuracy, or about
15-16 decimal digits of accuracy. (It's not the issue here, but worth
noting that using a decimal value to represent a float is itself not
precise, since the exact values stored in floats are not all exactly
representable as decimals unless you use about 50+ decimal places.)
Most likely, the reason you are seeing only 13 or 14 digits of accuracy
is because of the sum() function. There is unavoidable inaccuracy in
floating point addition: the precise sum may contain more significant
digits than the addends, because of a "carry", but floats have a fixed
number of bits of precision, and there is truncation or rounding. The
order in which the SUM() is carried out can affect the approximate
answer. Here are some examples to show the difference that can result:
set nocount on
go
select sum(1e0) + sum(0.0000000077777e0) from Northwind..Orders
select sum(1.0+0.0000000077777e0) from Northwind..Orders
select sum(1e0/1101 + 77e0) from Northwind..Orders
select sum(partialSum) from (
select sum(1e0/1101 + 77e0) as partialSum
from Northwind..Orders
group by CustomerID
) T
Steve Kass
Drew University
marcmc wrote:

>in QA I have
>select
>sum(F_Basic_premium) TotalBasicPremium
>from fat_bse_po_risk_detail(nolock)
>where f_latest_sum_insured = 'Y'
>-- 1. 22421075998346.898
>-- 1. 22421075998346.594
>
>select
>sum(F_Basic_premium) TotalBasicPremium
>from fat_bse_po_risk_detail(nolock)
>where f_latest_sum_insured = 'Y'
>-- 2. 22421075998347.285
>-- 2. 22421075998346.680
>
>Weird?
>
>|||While float is not precise, it is predictable right? Are you saying you ran
the same query, on the same data set, in the same table structures,
everything the same, other than on different servers? Why the nolock? Are
you sure that noone is changing the data? Can you give more details?
----
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"marcmc" <marcmc@.discussions.microsoft.com> wrote in message
news:25AC21E1-119C-4725-99F0-78B98A2FF19D@.microsoft.com...
> in QA I have
> select
> sum(F_Basic_premium) TotalBasicPremium
> from fat_bse_po_risk_detail(nolock)
> where f_latest_sum_insured = 'Y'
> -- 1. 22421075998346.898
> -- 1. 22421075998346.594
>
> select
> sum(F_Basic_premium) TotalBasicPremium
> from fat_bse_po_risk_detail(nolock)
> where f_latest_sum_insured = 'Y'
> -- 2. 22421075998347.285
> -- 2. 22421075998346.680
>
> Weird?
>|||Jacco,
I would add that precise vs. approximate is not the best way to
characterize the difference between decimals and floats. Both have
limited precision, and both represent a certain set of rational numbers
exactly.
There are two important differences:
1. Floats use a "floating-point" representation, whereas decimals use a
"fixed-point" representation.
As a result, the values a float can store have greater extremes than
decimal values, but the values a decimal can store are equally spaced on
the number line, where float-storable values are tightly packed near
zero and very far apart away from zero.
2. Decimal precision is based on base-10 digits, and float precision is
based on base-2 digits.
As a result, decimal-storable numbers can be converted to human-readable
strings more conveniently than floats. A decimal with 20 digits of
precision requires at most 22 ASCII characters to represent it exactly
in human-readable form. On the other hand, a float with ~ 54 binary
digits of precision (which is only about "15 decimal places of
accuracy," to say it in a sloppy but familiar way), can require more
than 50 ASCII characters to represent it exactly in human-readable form
(allowing "scientific notation", which saves space), since floats are
based on binary, not decimal fractions: 1/2 = 0.5, 1/4 = 0.25, 1/8 =
0.125, 1/2^ 53 = 0.[53 decimal digits]. Rarely are client tools built
to display float exactly, though they could be. Because of this, floats
often suffer unnecessary loss of accuracy when they are converted to
ASCII and back. This happens less often for decimals, because they
"play well" with ASCII representations.
That said, your remark is much better in terms of practical advice than
mine. ;)
SK
Jacco Schalkwijk wrote:

>Float is not precise, it is approximate. If you want precise numbers, which
>I assume you want with things like premiums, use the decimal datatype, in
>this case decimal(17,3).
>
>|||There is one twist - numbers in "binary scientific notation" can always
be written as
M times 2^E
for M >= 0.5 and M < 1 (any other range with endpoints in 2/1 ratio
works, too)
Represented in binary, then, every possibly value of M looks like
0.1xxxxxxx. Some
very clever person realized that if every value of M has a 1 after the
twoimal point, we
don't need to store it. So only the xxxxxx.. are stored. We avoid
wasting space, and
we don't end up with all kinds of multiple representations of the same
number, like
[written in base 2 here] 0.011 * 1000 = 0.0011 * 10000 = 0.11 * 100
... (not
wasting space and avoiding duplicates are really the same thing - math
stuff about
injections and pigeonholes)
So the 1 is not stored, and there's a "free" extra bit of accuracy.
There are also some quirks to allow the representation of some numbers
closer
to zero than +/- 10^308, with reduced precision, and for storing (and
making sense
of!) +/- infinity. A good discussion is here:
http://stevehollasch.com/cgindex/coding/ieeefloat.html
SK
CBretana wrote:
>to answer your question directly, much more accurately than you need. The
>degree of accuracy depends on what number you are trying to store... If you
>are trying to store one of the numbers that a float can represent, then it'
s
>100% Dead-on balls accurate... Floats are no different than any other
>datatype in that they can represent any number in a set of numbers, what
>matters is the distance between the individual numbers that they can
>represent exactly... It's the same as with Integers, you can represent 1, o
r
>2 exactly, but if you try to use an integer for 0.5, then the integer is
>going to be at least .5 off... Or if you use a decimal(5,2) (2 decimal
>places), to represent 0,12 or 0.13 exactly, but if you try to represent
>0.125...
>So with floats, the only difference is that a float stores binary numbers,
>not decimals, and stores them in scientific notation format, not directly,
as
>an integer or decimal datatype does... With a float, which is 64 bits wide,
>53 bits are used for the mantissa, and the remaining 11 are used for the
>exponent. So the mantissa can be any binary number between minus 2 to the
52
>power, and Plus 2 to the 52 power -1
>So the "accuracy", so to speak of a float, in storing the mantissa portion
>could be said to be within .000000000000000111022302462515654042% (thats
1
>divided by 2 to the 53nd power)
>The exponent is 11 bits and is an integral type, so is is 100% dead-on ball
s
>accurate for any integer exponent... This gives floats their Min and max
>values, which are:
>Negative numbers: - 1.79E + 308 through -2.23E - 308,
>and for
>positive numbers: 2.23E -308 through 1.79E + 308.
>and of course zero (0)
>So, for any real number you pick, (within the range ) there is a float
>within .000000000000000111022302462515654042% of that value.
>
>
>"marcmc" wrote:
>
>|||The results of arithmetic on floats is not deterministic unless the order
in which the arithmetic takes place is deterministic. Especially for
SUM() or another aggregate, we shouldn't expect the floats to be
added in the same order every time, so we shouldn't expect the same
result.
A similar thing happens with decimals, but as far as I know, it can't
be because of changes in row order. Instead, it's got to do with how
expressions are typed:
select sum(1.1101111111 + 77.0) from Northwind..Orders
select sum(1.1101111111) + sum(77.0) from Northwind..Orders
SK
Louis Davidson wrote:

>While float is not precise, it is predictable right? Are you saying you ra
n
>the same query, on the same data set, in the same table structures,
>everything the same, other than on different servers? Why the nolock? Are
>you sure that noone is changing the data? Can you give more details?
>
>|||thx All

No comments:

Post a Comment