Connectors in the Titan supercomputer at Oak Ridge National Laboratory have been repaired, and workers are preparing the world’s fastest machine for a second round of acceptance testing, an official said Monday.
That testing could allow the $100 million machine to be put into full production mode by the end of this month or early May, said Jeff Nichols, ORNL associate lab director for computing and computational sciences.
Titan won’t be available to researchers for a short period while the lab re-runs acceptance tests.
“We are communicating with the users to schedule that—once we have it back, it shouldn’t take but a few weeks to finish acceptance and get it into full production mode,” Nichols said.
In March, Nichols said hundreds of connectors were being re-soldered in Titan each week. The connectors had too much gold, and solder was interacting with the gold on connector pins, making the solder unstable and leading to cracks. There are about 20,000 of the pencil-sized connectors, which link central and graphic processing units, or CPUs and GPUs.
Lab officials had almost completed the acceptance once before, but workers noticed a degradation in communications between the CPUs and GPUs.
The acceptance testing includes a 14-day stability test that will ensure Titan is finishing problems, producing the right answers, and performing appropriately.