I asked people to run the following code using
default compilers and no special switches:
% cc -o sqr -O sqr.c -lm
% time ./sqr
where sqr.c is:
#include <stdio.h>
#include <math.h>
main()
{
register i;
double f=0.0;
double sqrt();
for(i=0;i<=1000000000;i++)
{
f+=sqrt( (double) i);
}
printf("Finished %20.14f\n",f);
}
and report the results.
Some early results are now in, and they're not what I expected.
Although I did not know this when I began, it turns out that there's a summation formula for this that produces a proveably "right" answer which can, in turn, be approximated to any required level of arithmetic precision.
Please note that this right answer is due to the members of the sci.math.num-analysis newsgroup (most specifically Raymond Toy and Peter Montgomery).
Here's what Raymond Toy wrote in the newsgroup:
How about using the Euler-Maclaurin formula? My book has the example
n sum k^(-s) = zeta(s) + 1/(1-s)/n^(s-1) + 1/2/n^(s) k=1 - bern(2)/2*comb(s,1)/n^(s+1) ... -bern(2*k)*comb(s+2*k-2,2*k-1)/n^(s+2*k-1)
where bern(k) are the Bernoulli numbers, comb(n,k) = n!/k!/(n-k)! and zeta(s) is Riemann's zeta function.
The problem we want is s=-1/2. The first few terms are
sum sqrt(k) = 2/3*n^(3/2) + 1/2*sqrt(n) + zeta(-1/2).
If I ignore the zeta term, the remaining terms for n = 1e9 gives 21081851083600.37596259382529338.
So that's the right answer - near enough.
Some notes on the results reported below:
Beyond that, I'm waiting for more results - and your advice!- before drawing any conclusions. Take a look, and send me a note telling me what you think this all means.
Contributor | OS | CPU | Mhz | Compiler | Compile command | Result | Error | Time (secs) |
---|---|---|---|---|---|---|---|---|
Randall L. Rathbun | Linux | PII | 400 | GCC 2.95 with GMP 4.0 | Precision set to 200 digits | 21081851083600.37596259382529338 | 0.0 | 6 hours, 11 minutes, 45.9 sec |
Paul Murphy | Solaris 2.7 | Ultra-II | 450 | gcc 2.95 | cc -o sqr -O2 sqr.c -lm | 21081851083600.55859375 | -0.18263116 | 79.0u 0.1s 0:15 96% 0+0k 0+0io 0pf+0w |
Paul Murphy | SunOS 4.1.4 (Sun IPX) | SP 2+ | 40 | K&R | cc -O -o sqr sqr.c -lm | 21081851083600.55859375 | -0.18263116 | 6 hours, 57 minutes, 18 seconds |
Michael Stajduhar | Redhat7.1 Kernel 2.4.17 | Intel PIII | 933MHz | gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81) | gcc -lm -o test test.c | 21081851083598.38281250000000 | 1.99219 | 176.350u 0.340s 3:07.78 94.0% |
Michael Stajduhar | Redhat7.1 Kernel 2.4.17 | Intel PIII | 933MHz | gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81) | gcc -march=i686 -O3 -lm -o test test.c | 21081851083600.38281250000000 | -0.0078125 | 74.680u 0.110s 1:14.79 100.0% 0+0k 0+0io 109pf+0w |
Michael Stajduhar | SUSE 7.1 | Intel PI | Unknown | /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs gcc version 2.95.2 19991024 | gcc -o test test.c -lm | 21081851083598.38281250000000 | 1.99219 | real 39m57.842s user 39m1.750s sys 0m1.210s |
Michael Stajduhar | SUSE 7.1 | Intel PI | Unknown | /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs gcc version 2.95.2 19991024 | gcc -O3 -o test test.c -lm | 21081851083600.38281250000000 | -0.0078125 | real 11m56.810s user 11m48.850s sys 0m0.250s |
Hubert Feyrer | NetBSD | PIV | 1.7GHz | gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release) | gcc -O sq.c -o sq -lm | 21081851083600.55859375000000 | -0.183594 | 112.383u 0.000s 1:52.98 99.4% 0+0k 2+75io 2pf+0w |
Hubert Feyrer | NetBSD | PIV | 1.7GHz | /usr/pkg/gcc-2.95.3/lib/gcc-lib/i386--netbsdelf/2.95.3/specs | gcc -O sq.c -o sq -lm | 21081851083600.55859375000000 | -0.183594 | 110.656u 0.000s 1:51.64 99.1% 0+0k 2+110io 1pf+0w |
David Perry | Solaris 8 | Ultra III | 750 | gcc version 2.95.3 20010315 (Not 64 bit as far as I can tell!) | gcc -o sqrO3 -O3 sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | 58.40u 0.01s 1:00.76 96.1% |
David Perry | Darwin | G4 | 450 | Apple Computer, Inc. version gcc-932.1, based on gcc version 2.95.2 19991024 (release) | gcc -o sqrO -O sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | 270.930u 1.040s 4:47.78 94.5% 0+0k 0+1io 0pf+0w |
Alastair Stewart | Linux 2.4.7 (redhat) | AMD | 1.6GHz | gcc 2.96-98 (redhat) | cc -o sqr -O sqr.c -lm | 21081851083600.38281250000000 | -0.0078125 | real 0m23.492s user 0m23.170s sys 0m0.000s |
Pat Gunn | Linux 2.4.7 | athlon | 750 | gcc 3.0 | 21081851083598.38281250000000 | 1.99219 | 121.730u 0.030s 2:22.15 85.6% 0+0k 0+0io 104pf+0w | |
Pat Gunn | Linux | G3 | 300 | gcc 2.95.2 | 21081851083600.55859375000000 | -0.183594 | 515.390u 0.030s 8:35.45 99.9% 0+0k 0+0io 154pf+0w | |
Pat Gunn | Linux | alpha/21164a | 533 | gcc 3.0 | 21081851083600.55859375000000 | -0.183594 | 570.998u 0.049s 10:14.14 92.9% 0+0k 0+0io 0pf+0w | |
Pat Gunn | Linux | alpha/21164a | 533 | ccc here is DEC/Compaq C compiler V6.4, cpml is the DEC/Compaq Alpha-optimized math library | ccc -lcpml fptest.c -o fptest | 21081851083600.55859375000000 | -0.183594 | 151.426u 0.003s 2:31.44 99.9% 0+0k 0+0io 0pf+0w |
McKerr Brian | Solaris 8 | UIII | 750 | Sun Forte 6.2 Compiler | cc -o sqr -O sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | real 1:59.1 user 1:59.0 sys 0.0 |
McKerr Brian | Red hat 7.2 | PIV | 1.6GHz | gnu c compiler 2.96 | cc -o sqr -O sqr.c -lm | 21081851083600.38281250000000 | -0.0078125 | real 0m31.098s user 0m29.250s sys 0m0.040s |
McKerr Brian | Red hat 7.1 | PII | 400 | gcc 2.96 | cc -o sqr -O sqr.c -lm | 21081851083600.38281250000000 | -0.0078125 | real 2m54.350s user 2m53.180s sys 0m1.140s |
Tor Slettnes | Debian GNU/Linux 2.2.19 | Athlon | 1.2GH | gcc 2.95 | gcc test.c -o test -lm | 21081851083598.38281250000000 | 1.99219 | real 2m7.523s user 2m3.160s sys 0m0.020s |
L Carver | Solaris 8 | Ultra IIi | 440 | using Forte 6.1, "-O1" optimization | cc -o sqr -xO2 ~/src/sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | real 174.59 user 168.24 sys 0.03 |
L Carver | Solaris 8 | Ultra IIi | 440 | using gcc-2.95.3 | gcc -o sqr -O2 ~/src/sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | real 88.69 user 87.06 sys 0.00 |
L Carver | Solaris 8 | U III | 750 | gcc-2.95.3, "-O2" optimization | gcc -o sqr -O2 ~/src/sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | real 57.60 user 57.47 sys 0.00 |
L Carver | Solaris 8 | U III | 750 | using Forte 6.1, "-O2" optimization | cc -o sqr -xO2 ~/src/sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | real 119.15 user 118.89 sys 0.00 |
L Carver | HP-UX 11 | PA-8600 | 550 | HP Ansi C B.11.11.02, "-O1" optimization | cc -o sqr -O1 ~/src/sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | real 3:20.8 user 3:20.7 sys 0.1 |
L Carver | AIX 4.3.3.0 | RS-64-III | 500 | IBM C (ibmcxx.cmp) 3.6.6.4, "-O2" optimization | cc -o sqr -O2 ~/src/sqr.c -lm | 21081851083600.55859375000000 | -0.183594 | Real 341.19 User 170.95 System 0.01 |
On top of this Philip C Tsao reversed the for loop - for(i=1000000000;i>0;i--) - and discovered that most systems produce a marginally different answer. On SPARC this is unaffected by typing i as an integer and initializing it as 1E9.
Well...
I included this example in a article about mismatch between Sun strength as a company
and its weakeness as a NASDAQ share and received an extended response from
some Sun engineers. One them (Tarsy) has now (mid 2004) written a much more
WHile you're here, how about taking look at The Unix Guide to
Defenestration?