Draft Blog Entries

% fortune -ae paul murphy

Only 256 threads? naaaa

One of the reports on Sun's memory and Rock announcements I didn't cite yesterday, because it didn't quote any of the blurts, was by Timothy Prickett Morgan on itjungle.com

In many ways his retelling illustrates what "value added", i.e. interpretive, reporting should be all about with solid context setting, clear differentiation of fact from interpretation, and some interesting differences from the group think you usually find in the techpress.

There was one comment, however, that drew my attention more than others - here it is in context:

The rumour mill has been suggesting for a few months that a Rock processor code-named "Pebble" will be used in single-socket boxes, like the Niagara designs of today, and that another one code-named "Boulder" will have NUMA or SMP electronics that allow them to be lashed together into machines with two, four, or eight sockets in a single system image. Azhari would not confirm these details of the future Rock-based servers, but if this is true, then the future "Supernova" servers from Sun will bring to bear a lot of threads in a very modest box in terms of socket count. Each Rock core is expected to have two processing threads and its own floating point unit, which means an eight-socket box would have 256 threads --about the upper limit that an operating system can handle these days.

It's that throw-away comment about 256 threads being the current upper limit for concurrency in Solaris that got my attention - because he presents it with certainty, but it's not true.

He gets that number for Rock by taking 2 concurrent threads per core, multiplying that by 16 cores per CPU, and then imagining 8 CPUs in the box.

From a Solaris perspective a 32 thread T1 core looks like a 32 CPU SMP machine, with the minor proviso that you can't reserve more than seven cores for particular workloads because the OS can't be restricted to one thread. I'm guessing that Solaris will see Rock the same way - i.e. a 256 thread system would look to Solaris like an eight board, 256 CPU, SMP machine.

Within Solaris 10 and the current Solaris 11 pre-builds going on in the openSolaris community space there's a piece of magic called the Unified Process Model -or, more accurately there's a missing piece: the old standard M x n processes to threads model. Instead, everything in Solaris is now done through light weight processes [LWP] -not quite the same as traditional threads, but close enough to that for this discussion.

So what are the limits? The SPARC hardware architecture itself has one - no more than 1023 CPUs in an SMP configuration, but that wouldn't apply to the coolthreads machines.

The most immediate limit, at least in the Solaris 10 binaries for SPARC, is that MAXPID is stored as a short -meaning that the kernel will silently treat any value higher than 30,000 you choose for max_nprocs as 30,000.

If you're the Solaris equivalent of a luddite, you can insist on using processes as your unit of measurement and then allocate a maximum of 4,000 threads per process, at least until you get the million thread limit set in tre.c - but this was, as the saying goes, "deprecated" in Solaris 9 and won't be supported after Solaris 10.

Within the unified process model, however, you can bypass this limit because the old process/LWP process distinction still makes a difference - and the limit on the number of light weight processes you can have is set by the size of the segkp structure from which the kernel has to allocate 24K each time an LWP is created. In Solaris 10/SPARC that size defaults to 2GB, so the practical limit on LWPs is just about 87,000.

Now if you happen to work as a Sun kernel developer or have a big enough machine, lots of time on your hands, and the skills to work with the openSolaris code base, you can bypass both of these limits (and a dozen or more others) to allow a million runnable LWPs - something that I'm told has been done successfully on a 72 CPU 25K.

Notice, however that in debunking his 256 number I've implicitly assumed that the virtual processors established under the coolthreads architecture can be considered equivalent to real CPUs - because he could have been trying to say that Solaris maxs out at 256 physical processors.

My understanding is that people have built Solaris on machines with more CPUs than that, but commercial implementations of more than 106 physical CPUs (i.e. 212 cores) have been limited, as far as I know, to clusters. In those, inter-machine communication is hardware assisted and reasonably fast, but it's not the case that a ten machine cluster runs one kernel with 10 zones - so its worth asking if he would have a point if that's what he was trying to say.

In fact he wouldn't. And we know that because some people at the University of Illinois at Chicago posed a closely related question: can the T1 UltraSPARC be treated as a general purpose, 32 processor, parallel machine?

They tested that by running standard openMP code and came to the conclusion that it is - meaning first that masochists can downgrade the machine to run as kind of locally connected grid, and more importantly that you really can treat each virtual CPU as if it were a real one.

All of which means that if you set all user assignable tunables to their relative maximums in /etc/system the default Solaris SPARC binaries should pretty much fire up and run on a Coolthreads machine with close to 30,000 virtual CPUs - of course, you'd need modified boot proms along with 64TB of RAM, and you wouldn't be able to get any work done on the machine, but so what? - the point is that you could do it.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.