The Spring Cray User Group Meeting (CUG) took place in the Harbour Castle Westin Hotel in downtown Toronto, from April 9th. until April 13th.. The conference was hosted by the Ontario Centre for Large Scale Computation (OCLSC), and its theme was ``Networked SuperComputing''. In this report, I have attempted to condense the essential messages from the talks I attended, and from other attendees with whom I spoke. The report also contains a description of the Reception (hosted by Cray Research International (CRI)) and the Conference Dinner.
The Conference proper began on the Tuesday, this day being taken up with various Special Interest Committees and registration.
A discussion list on Unicos had been set up. To subscribe, people were invited to send mail to unicos-request@violet.berkeley.edu. A future enhancement to CDBX would include the possibility in the X windows environment of plotting debugger variables against one another. A cft77 defaults file called .cft77rc would allow the user to specify his preferred standard compiler options. The San Diego SuperComputer Centre is now maintaining a software source archive. Access is by site, with an individual guest account, or by anonymous ftp. Contributions were sought. Persons interested in submitting software were encouraged to contact schroeder@sds.sdsc.edu. Some already available software included utilities for converting to and from IEEE floating point/VAX and VAX text to/from Unicos text.
The keynote address was given by Fred Weingarten from the Office for Technology Assessment at the White House. Their business was long term strategies for science in the U.S.. A typical keynote address, the only information imparted being that the intention is to upgrade NSFNET to T3, which most people know anyway.
J.Rollwagen, the Chairman of CRI, gave a very polished presentation in which he emphasised the importance of simulations of the environment using Crays. In the handout that had been distributed before the conference by OCLSC, they had warned that ``Toronto weather was unpredictable". This gave Rollwagen the opportunity to crack a rather good joke the upshot of which was that the Toronto Weather Centre clearly needed another Cray. He emphasised the importance of sharing SuperComputer facilities, which allowed the attack of problems from many viewpoints. This led him to the importance of high speed networks. He reviewed the evolution of interest in Supercomputing, which he supposed had been vitalised by its coupling with high speed graphics. He estimated that less than 10% of the scientific community had been exposed to a SuperComputer, and said that it was part of CRI's mission to make this figure more than 90% by the end of the century. Cray was optimistic about producing a 100 GigaFLOP machine by the mid-1990s, and expected a TeraFLOP machine by the end of the century. With current technology, such a machine would apparently need several MegaWatts of power to run!
Ed Masi (CRI) then reported on current plans at CRI. A special processor for the Cray Y-MP would be announced on May 14th. having 256 MegaWords of main memory. CRI was acquiring the company called SuperTek. This is a mini SuperComputer company. The Cray Y-MP 2E would be a market leader, air-cooled, and providing 5 times better price/performance ratio than the IBM-3090.
Bob Ewald (CRI) (who was ``murdered'' later on at the conference ...) reported on CRI software. 15% of CRI budget was being expended on R and D, of which half was being used for software. UNICOS 6.0 would be POSIX compliant. cft77 version 4.0 would feature auxiliary arrays in the SSD. CDBX functionality was being improved, and more emphasis placed on X-windows features. TCP/IP and OSI were the target protocols for Cray software utilities. HPPI and FDDI were mentioned as future directions.
At this point in the proceedings, and much to the audience's amusement, Ed Masi got up and proceeded, with the aid of the conference ``free gift'', a pair of scissors, to cut Bob Ewald's tie in half! Bob Ewald then cut Ed Masi's tie in half. More on this strange behaviour below ...
At lunch I sat next to J.Barton from NASA/AMES who seemed rather shocked at my conversational gambit that the mainframe's days as a time-sharing compute server were numbered. (Rather a tactless premis at a SuperComputer conference!) He told me that at NASA/AMES they had been very worried about mass storage problems, in particular the WORN problem: Write Once, Read Never. He said that in fact their fears had been unfounded.
In the afternoon, Charles Grassi (CRI) presented results from the PERFECT benchmarks on various machines. The PERFECT benchmarks are a suite of application programs that attempt to be an industry standard, Cray and IBM having had a large say in their composition. The results were staggeringly in favour of Cray machines, compared with the IBM 3090, the NEX SX2, and RISC workstations. Grassi pointed out that massively parallel computer system vendors all claim that applications should be re-written to run well on their systems.
Anders Grimsrud (CRI) presented the Cray Communications Primitives (CCP), that are a software toolset allowing very easy setting up of communications over TCP/IP via sockets. This was a ``black-box'' approach, with the user only having to decide between synchronous or asynchronous communications. He gave a very simple and straightforward example of sending and receiving data from the Cray to a Sun 4. He emphasised possible distributed computing pitfalls, namely high I/O across slow networks, and distributing the wrong tasks. It was unclear from the presentation what the exact status of the tools was, but interested parties were invited to contact Cal Kirchof at Mendota Heights.
Professor W.Kahan, this year's winner of the Turing Award, spoke on the loss of precision due to how arithmetic was handled in hardware on the Cray. This was a very entertaining and amusing talk, but in essence serious. The whole problem centred around the lack of a guard bit in the hardware, that lost precision on, for example, subtraction. For very pointed triangles, calculation of the area using the formula ..... works correctly on IBM, DEC, HP, Sun and MIPS machines, but not on the Cray! The Liu test program for ``C'' runs on all commercially significant machines except Crays! There are apparently some machine numbers on the Cray that can be multiplied by 1 without error, but not by 0.3, for example! Kahan's thrust was that, due to these limitations, a lot of software was not distributed because it was intended to run on all machines, and it didn't on Cray. So people without a Cray were being penalised.
A very lavish affair with copious Champagne on flow and mounds of such delicacies as king prawns, smoked salmon, and so on. I was less sure about an unidentified ``thing'' I ate in some sort of shell.
D.Thompson (CRI) described how closely Cray were working with ULTRANET. A 100 MBytes/s HPPI channel would be made available later this year. Cray's policy was to follow, and not break, network standards, but to expand them where necessary. They are actively researching new protocol suites for high speed environments. In the competition between Motif and OpenLook, Cray were resigned to the probability that they would have to support both.
J.Golio (CRI) spent ten minutes of his thirty minutes slot explaining what he was going to talk about in the remaining twenty minutes! Cray's FDDI test setup comprised a Y-MP and an X-MP connected via NSC DX4130 drivers to an FDDI dual ring. Also attached to this ring was a Sun 4 (FEI3 to Cray low speed channel), and an IBM 4381 via an NSC DX4420 driver. Another FDDI ring attached an Apollo DN10000 and the Sun 4. Throughput results included 25 Mbits/s Cray to Cray (after optimizing the read/write block size) and 12 Mbits/s Cray to Sun.
Cray would be undertaking an FDDI beta test in Europe, late April or early May 1990.
J.O'Neil from ARCO Oil and Gas read out a long list of his requirements for a ``dream'' networked environment. The first user presentation, and not very exciting.
Winston Lu (NASA/AMES) described how NASA/AMES had attempted to solve their storage problems. Their current system comprises an IBM 3083 and Cray Y-MP connected via a Hyperchannel 100. They have 84 GigaBytes of IBM 3380 disks. They have achieved transfer rates of 27 MegaBytes/s to the Cray, but were looking at the problems of connectivity to other systems, and at generalising the software access to storage. The proposed configuration was a StorageTek 4400 ACS holding 1.2 TeraBytes on cartridges, together with the Unix Storage Server provided with the ACS by StorageTek, a Cray X-MP connected to the Y-MP via a High Speed External Channel (HSX) (running at 800 Megabits/s), and an FDDI ring using NSC DX devices. Protocols would be TCP/IP and AFS.
J.Minton (Lawrence Livermore National Laboratory) detailed a message-passing system that had been developed to respond to the need for file migration between high bandwidth, low storage capacity machines and low bandwidth, high capacity machines. Caching was used to allow fast access to oft-used files, and to reduce network traffic. The goals of the system had been location transparency, single file access and fast file access. NFS had been rejected due to its need to mount remote file systems, and due to only partial caching of the file. The Andrew file system was rejected due to the fact that it uses different methods to access small and large files. The implemented system migrated files on the basis of elapsed time or low remaining disk space. Simple file locking, including lock reduction, ensured data integrity for the user.
C.Zheng (San Diego SuperComputer Centre) described a networked interface for text editors based on TCP/IP. This allowed a remote Sun user to edit his or her files sitting on the Cray. I still don't understand why people want to use a SuperComputer to edit text files.
D.Butler of LimitPoint Systems gave an extremely complicated and confusing talk on vector bundles, complex dimensions, topology and decomposition to low integer dimensions. It had something to do with graphics, which is, after all, just a matter of moving a drawing instrument to position (x,y) on a device, with or without making a line (if one disregards selection of line colour). The lecture room was almost empty by the time he finished.
The Cray Maths Library had been rewritten by J.Kiernan (CRI), who presented some detailed results on the improvement in accuracy this afforded. The emphasis during the rewrite had been on correctness of results, rather than speed of the algorithm. In fact, most mathematical functions used a look-up table, rather than a Taylor-series expansion. Kiernan pointed out that a good scalar algorithm is not generally a good vector algorithm, but he had attempted to optimise between both cases. The aim had been to produce answers within one ULP (Unit in Last Place) of the correct answer (i.e. the machine number returned would be the closest number to the correct answer). Presently, functions such as xy are very erroneous, with as much as 14 out of 48 bits loss of precision!
In a general discussion afterwards, the opinion of the audience was sought as to whether accuracy or speed of result was more important. A long flight back to Europe ahead of me, I was overjoyed to hear a representative from Boeing state that for them the answer was easy: the company actually had a policy on this question, that accuracy of the result was of over-riding importance! Kiernan wondered openly how many application programs would now start producing quite different results with the new, accurate mathematics functions. The consensus was that we should be very grateful if such differences appeared, as it would show possible coding errors. I would be interested to know whether, for example, HEP Monte Carlo really needs accuracy at the expense of loss of execution speed.
The new library is invoked with the command segldr prog.o -lmv2.
All the attendees gathered at 7 o'clock for pre-meal drinks. It quickly became apparent that some mischief was afoot. There was a strange man in a nurse's outfit, sporting dark glasses and a Rastafarian hair-do. There was short chubby fellow drinking heavily with a ``Say NO to X-MP'' badge on his tasteless Hawaian shirt. Then there was a blousy load-mouthed woman lurching from one politely chatting group to another. For a dreadful moment I thought CERN had sent a contingent from CS Group.
After an introductory speech from Ed West, the Conference Organiser, the blousy woman got up to speak. It seemed she was from the EPA (Environmental Protection Agency), and that that organisition had a bone to pick with Cray. Apparently Cray's were harming the environment: the X-MP was a particularly bad offender. So they were all being recalled in 48 hours. A ripple of shock (well, laughter actually) ran round the audience. After chastising the audience for some while (``you've even polluted the water, I can tell that because no-one's drinking any'') she seemed to lose her voice, and finally drank copiously from a glass handed to her. The draught was evidently laced with something nasty, as with a blood curdling death scream she sank to the floor. Up jumped the ``nurse'' to the platform, and pronounced her dead, and then, removing his Gregory Peck's and wig, revealed himself to be ``Sam Sharp, Cop, Metro Cop''. He would be investigating the murder.
Everyone then trooped into the dining area, a ballroom, and took their places. By chance, I sat next to Mary Zosel, the CUG President. Vincent was on the same table as Bob Ewald. Both Ewald and Mary Zosel would turn out to be key players in the drama unfolding. The meal began as Sam Sharp moved about the room, seemingly picking people at random to interview in connection with the murder. He interviewed Ed Masi ... how did Ed get along with Bob Ewald (remember, he'd cut his tie in half at the beginning of the conference)? Ed clearly didn't get along too well. Then he interviewed the short chubby guy with the X-MP badge. Sam was clearly trying to get a fix on who had a motive for killing the EPA woman.
Suddenly, from the kitchens a horrible groan could be heard. In burst Bob Ewald, blood all over his shirt. He stumbled amongst the tables, bringing plates and glasses down. Finally, in the middle of the ballroom, and with a hair-raising quiver, he died. A great cheer went up from the assembled user community. ``Bob Ewald'', pronounced Sam, ``is dead!''. He was unceremoniously dragged from the room by Sam and volunteers (of which there was no shortage). Sam returned to the room brandishing the murder weapon, the Conference free gift, the scissors. The audience hissed and booed at the depths to which the murderer had stooped. Sam continued interviewing.
As the interviews went on, a picture of intrigue, spite, lust, revenge and general sour grapes emerged. The short chubby guy had designed the seat around the X-MP, and got nothing for it. He hated Ewald. Another attendee, a German called Otto, had had a brilliant idea for a new model X-MP, the X-MP Sleep-E, which had a pull down bed in the middle (``ideal for Government employees''). He'd taken it to Ewald, who had laughed at him. Sam had found a letter from IBM on Ewald's body, congratulating him on his large IBM stock holding, and offering 1000 PC's for every Cray X-MP taken off the market. By this time, Sam had narrowed the field down somewhat. The murderer was either Otto, the chubby guy, Mary Zosel, or an enormous ex-Hell's Angel, who'd been questioned closely on his nefarious dealings with Ewald.
To cut what is already a long story short, the murderer turned out to be Mary Zosel. I'm afraid I can't remember what her motive was supposed to have been, as by this time I was quite a few Martinis and glasses of rough wine down the slope. Maybe Vincent can remember. The other notable aspect of the Conference Dinner was the enormous piece of beef we were each served with. This would have made a very decent sunday joint for a family of four.
D.Sadler (CRI) gave an overview of Cray's products and thinking on networking. The currently available hardware is the FEI3, the HSX 100 MegaByte link, the 50 Megabit Low Speed Channel and the NSC Hyperchannel. Software comprises TCP/IP (BSD 4.3 Tahoe) with linemode telnet, NFS version 3.2.2 with Yellow Pages and PCNFS, X-windows version 11, releases 3 and 4. UNICOS 6.0 features TCP socket interface improvements, OSI and TCP common configuration files, and SNMP upgrade, bftp support, the IP TOS option, a new Trace capability, tuning information, telnet diagnostics, a new MTU discovery report, and FDDI support. In the future, there would be HPPI support, large windows for TCP, unified OSI support, BSD 4.4 compliance, and IP security features. The policy on NFS was to keep pace with the Sun releases. For X-windows, the MIT releases were being followed. Solutions for 3D visualisation, rendering, animation and batch graphics were being sought. There is effort porting PEX (PHIGs extensions for X-windows). And Open/Look, Motif were being actively evaluated.
A User Level Dynamic Job Mix Scheduler was described by M.Wan (SDSC). The environment addressed includes 3000 users, 100 of which are logged in on the Cray at prime time, in about 300 different research projects. Wan showed how a program that calculated a good batch job mix by evaluating a charge factor based on the ``nice'' value could improve throughput. The program dynamically adjusted the number of large memory jobs allowed to run, and how resources were distributed amongst the running jobs, every five minutes. It also calculated the optimal job mix at the current time, and a migration path towards that optimal mix from the current mix. Some running jobs could be allocated a period of wall-clock time during which they could not be suspended. Low priority jobs were of course preferentially suspended. The system described could successfully handle up to sixty NQS jobs (typically forty running, and twenty waiting).
Joy Foglesong (what a lovely name!) and Carole Hogan described the system lately licensed as UniTree by General Atomics, and developed as the LLNL Storage System. This is a file access system, a sort of grandiose FATMEN, that allows equivalent access whether the file is local or remote. They preferred to talk of objects rather than files, of course. The user sees a homogeneous storage medium, and is unaware of where the objects reside. In fact ``nothing in the system motivates him to find out''! A locking mechanism ensures object consistency. There is already a large body of user communities interested in buying UniTree, and there has already been a user requirements meeting. I spoke to Carole Hogan after the meeting, who said that they would be delighted to discuss technical requirements or enhancements to the current system with potential users. The person responsible for UniTree at General Atomics is Mike Hardy.
Stewart Ross (CRI) told of the probable dates for the last major releases of the VM Station (Autumn 1990), and the VMS Station (Summer 1991), amongst others. Cray ``added value'' would be in areas such as network batch job submission, reliable asynchronous file transfer and distributed programming tools. The network media would include FEI, NSC, UltraNET, FDDI and Ethernet. He introduced a new line of vendor environment products called SUPERLINK. SUPERLINK for VM would use the Remote Queuing System (RQS), and asynchronous file transfer would be with FTA. Other platforms would be the IBM RS/6000, the IRIS 4D and the DECstation 5000.
I'm afraid to report that Fred Crowner's tutorial on cdbx taught me nothing new. Except that I need to use the X-windows interface instead of line-mode.
R.Watson (LLNL) talked entertainingly about ``bumps and potholes on the road to networked computing''. He highlighted infrastructure support problems such as connectivity, funding, workstation heterogeneity, troubleshooting and security. His laboratory was managed by ``benevolent anarchy'', with volunteers managing routers, address databases, Kinetics boxes, and so on. He wished for an organized body that would take end-to-end responsibility for networking at his lab. (don't we all?). He believed that reducing the complexity of the network was the key to breaking some of the barriers. For instance, a star-structure was much easier to maintain than a long linear structure. One protocol suite, and one only, should be supported. Presently, workstations were being purchased with little or no coordination, by the users themselves, from different manufacturers, and in widely different configurations. Watson wanted purchases to be made through a central coordinating body (hear, hear), as users were not aware, in general, of the connectivity issues. It was a myth that Unix and TCP/IP created virtual homogeneity, but they certainly helped. He wanted all workstations to be removed, and replaced by X-terminals, centrally connected (in the star-structure) to banks of CPU servers. ``Let them log on there'', he said. I wondered what chance we would have at CERN of implementing such a hardline proposal! Essentially, then, Watson was in favour of reducing diversity, minimizing the number of network components, reducing the number of operating system types, introducing centralization, with better planning and record keeping.
Karen Shaeffer (Sandia National Laboratory, Livermore) reviewed the current status in Working Group 10 (SuperComputer Profile) of the IEEE technical committee fro Operating Systems (1003). She encouraged anyone not yet involved and with strong views to become so, as it was ``not too late''.