Floating point accuracy

Initial Results from the sqr.c Test

I asked people to run the following code using default compilers and no special switches:
% cc -o sqr -O sqr.c -lm
% time ./sqr

where sqr.c is:
#include <stdio.h>
#include <math.h>
main()
{
register i;
double f=0.0;
double sqrt();

for(i=0;i<=1000000000;i++)
{
f+=sqrt( (double) i);
}
printf("Finished %20.14f\n",f);
}

and report the results.

Some early results are now in, and they're not what I expected.

The Right Answer

Although I did not know this when I began, it turns out that there's a summation formula for this that produces a proveably "right" answer which can, in turn, be approximated to any required level of arithmetic precision.

Please note that this right answer is due to the members of the sci.math.num-analysis newsgroup (most specifically Raymond Toy and Peter Montgomery).

Here's what Raymond Toy wrote in the newsgroup:

How about using the Euler-Maclaurin formula? My book has the example
n sum k^(-s) = zeta(s) + 1/(1-s)/n^(s-1) + 1/2/n^(s) k=1 - bern(2)/2*comb(s,1)/n^(s+1) ... -bern(2*k)*comb(s+2*k-2,2*k-1)/n^(s+2*k-1)
where bern(k) are the Bernoulli numbers, comb(n,k) = n!/k!/(n-k)! and zeta(s) is Riemann's zeta function.

The problem we want is s=-1/2. The first few terms are
sum sqrt(k) = 2/3*n^(3/2) + 1/2*sqrt(n) + zeta(-1/2).
If I ignore the zeta term, the remaining terms for n = 1e9 gives 21081851083600.37596259382529338.

So that's the right answer - near enough.

Some notes on the results reported below:

  1. These results are in no particular order (yet) and I have used the contributors nomenclature rather than translating to a standard terminology. i.e. "Athlon" and "AMD" are reported as if they were different;
  2. I have many duplicated reports. Where possible I've selected the first one for mention here;
  3. all "non obvious" results had to be reported by two people to be included;
    In the specific case of "off by 2.0" error reported by Michael Stajduhar, the result was duplicated by several others including Tor Slettnes and appears to be for real.
  4. the SPARC results are 100% consistent - and wrong by 0.182631 - from 1989 on.
  5. the higher speed AMD Athlon is the easy winner on accuracy
  6. compilers make far more of difference than expected

Beyond that, I'm waiting for more results - and your advice!- before drawing any conclusions. Take a look, and send me a note telling me what you think this all means.

Contributor OS CPU Mhz Compiler Compile command Result Error Time (secs)
Randall L. Rathbun Linux PII 400 GCC 2.95 with GMP 4.0 Precision set to 200 digits 21081851083600.37596259382529338 0.0 6 hours, 11 minutes, 45.9 sec
Paul Murphy Solaris 2.7 Ultra-II 450 gcc 2.95 cc -o sqr -O2 sqr.c -lm 21081851083600.55859375 -0.18263116 79.0u 0.1s 0:15 96% 0+0k 0+0io 0pf+0w
Paul Murphy SunOS 4.1.4 (Sun IPX) SP 2+ 40 K&R cc -O -o sqr sqr.c -lm 21081851083600.55859375 -0.18263116 6 hours, 57 minutes, 18 seconds
Michael Stajduhar Redhat7.1 Kernel 2.4.17 Intel PIII 933MHz gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81) gcc -lm -o test test.c 21081851083598.38281250000000 1.99219 176.350u 0.340s 3:07.78 94.0%
Michael Stajduhar Redhat7.1 Kernel 2.4.17 Intel PIII 933MHz gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81) gcc -march=i686 -O3 -lm -o test test.c 21081851083600.38281250000000 -0.0078125 74.680u 0.110s 1:14.79 100.0% 0+0k 0+0io 109pf+0w
Michael Stajduhar SUSE 7.1 Intel PI Unknown /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs gcc version 2.95.2 19991024 gcc -o test test.c -lm 21081851083598.38281250000000 1.99219 real 39m57.842s user 39m1.750s sys 0m1.210s
Michael Stajduhar SUSE 7.1 Intel PI Unknown /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs gcc version 2.95.2 19991024 gcc -O3 -o test test.c -lm 21081851083600.38281250000000 -0.0078125 real 11m56.810s user 11m48.850s sys 0m0.250s
Hubert Feyrer NetBSD PIV 1.7GHz gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release) gcc -O sq.c -o sq -lm 21081851083600.55859375000000 -0.183594 112.383u 0.000s 1:52.98 99.4% 0+0k 2+75io 2pf+0w
Hubert Feyrer NetBSD PIV 1.7GHz /usr/pkg/gcc-2.95.3/lib/gcc-lib/i386--netbsdelf/2.95.3/specs gcc -O sq.c -o sq -lm 21081851083600.55859375000000 -0.183594 110.656u 0.000s 1:51.64 99.1% 0+0k 2+110io 1pf+0w
David Perry Solaris 8 Ultra III 750 gcc version 2.95.3 20010315 (Not 64 bit as far as I can tell!) gcc -o sqrO3 -O3 sqr.c -lm 21081851083600.55859375000000 -0.183594 58.40u 0.01s 1:00.76 96.1%
David Perry Darwin G4 450 Apple Computer, Inc. version gcc-932.1, based on gcc version 2.95.2 19991024 (release) gcc -o sqrO -O sqr.c -lm 21081851083600.55859375000000 -0.183594 270.930u 1.040s 4:47.78 94.5% 0+0k 0+1io 0pf+0w
Alastair Stewart Linux 2.4.7 (redhat) AMD 1.6GHz gcc 2.96-98 (redhat) cc -o sqr -O sqr.c -lm 21081851083600.38281250000000 -0.0078125 real 0m23.492s user 0m23.170s sys 0m0.000s
Pat Gunn Linux 2.4.7 athlon 750 gcc 3.0   21081851083598.38281250000000 1.99219 121.730u 0.030s 2:22.15 85.6% 0+0k 0+0io 104pf+0w
Pat Gunn Linux G3 300 gcc 2.95.2   21081851083600.55859375000000 -0.183594 515.390u 0.030s 8:35.45 99.9% 0+0k 0+0io 154pf+0w
Pat Gunn Linux alpha/21164a 533 gcc 3.0   21081851083600.55859375000000 -0.183594 570.998u 0.049s 10:14.14 92.9% 0+0k 0+0io 0pf+0w
Pat Gunn Linux alpha/21164a 533 ccc here is DEC/Compaq C compiler V6.4, cpml is the DEC/Compaq Alpha-optimized math library ccc -lcpml fptest.c -o fptest 21081851083600.55859375000000 -0.183594 151.426u 0.003s 2:31.44 99.9% 0+0k 0+0io 0pf+0w
McKerr Brian Solaris 8 UIII 750 Sun Forte 6.2 Compiler cc -o sqr -O sqr.c -lm 21081851083600.55859375000000 -0.183594 real 1:59.1 user 1:59.0 sys 0.0
McKerr Brian Red hat 7.2 PIV 1.6GHz gnu c compiler 2.96 cc -o sqr -O sqr.c -lm 21081851083600.38281250000000 -0.0078125 real 0m31.098s user 0m29.250s sys 0m0.040s
McKerr Brian Red hat 7.1 PII 400 gcc 2.96 cc -o sqr -O sqr.c -lm 21081851083600.38281250000000 -0.0078125 real 2m54.350s user 2m53.180s sys 0m1.140s
Tor Slettnes Debian GNU/Linux 2.2.19 Athlon 1.2GH gcc 2.95 gcc test.c -o test -lm 21081851083598.38281250000000 1.99219 real 2m7.523s user 2m3.160s sys 0m0.020s
L Carver Solaris 8 Ultra IIi 440 using Forte 6.1, "-O1" optimization cc -o sqr -xO2 ~/src/sqr.c -lm 21081851083600.55859375000000 -0.183594 real 174.59 user 168.24 sys 0.03
L Carver Solaris 8 Ultra IIi 440 using gcc-2.95.3 gcc -o sqr -O2 ~/src/sqr.c -lm 21081851083600.55859375000000 -0.183594 real 88.69 user 87.06 sys 0.00
L Carver Solaris 8 U III 750 gcc-2.95.3, "-O2" optimization gcc -o sqr -O2 ~/src/sqr.c -lm 21081851083600.55859375000000 -0.183594 real 57.60 user 57.47 sys 0.00
L Carver Solaris 8 U III 750 using Forte 6.1, "-O2" optimization cc -o sqr -xO2 ~/src/sqr.c -lm 21081851083600.55859375000000 -0.183594 real 119.15 user 118.89 sys 0.00
L Carver HP-UX 11 PA-8600 550 HP Ansi C B.11.11.02, "-O1" optimization cc -o sqr -O1 ~/src/sqr.c -lm 21081851083600.55859375000000 -0.183594 real 3:20.8 user 3:20.7 sys 0.1
L Carver AIX 4.3.3.0 RS-64-III 500 IBM C (ibmcxx.cmp) 3.6.6.4, "-O2" optimization cc -o sqr -O2 ~/src/sqr.c -lm 21081851083600.55859375000000 -0.183594 Real 341.19 User 170.95 System 0.01

On top of this Philip C Tsao reversed the for loop - for(i=1000000000;i>0;i--) - and discovered that most systems produce a marginally different answer. On SPARC this is unaffected by typing i as an integer and initializing it as 1E9.

Well...

I included this example in a article about mismatch between Sun strength as a company and its weakeness as a NASDAQ share and received an extended response from some Sun engineers. One them (Tarsy) has now (mid 2004) written a much more

WHile you're here, how about taking look at The Unix Guide to Defenestration?