OK, I know that people get sick of my science posts all the time, but hey, this time I get to post about science AND computers, soooo...
One thing I love about my job is that I get to go to all these cool conferences and not have to pay for it. So, I'm here in San Diego (which is quite cool both literally and figuratively speaking), but am without a connnection (which is not cool) so I'm sitting in an internet cafe getting my net fix. The conference I am attending is the 18th annual protein society symposium and I just got to hear a cool talk by Vijay Pande, the man behind the F@H project (in conjunction with Eric Lindahl) at Stanford. I'm going to try to tal to him later, but for now I thought you all might be interested to hear how your megaflops are being used.
So what is the project really (from the creators viewpoint)? As we all know it is an attempt to mathematically model the folding of proteins (which has many applications in disease and such). But can we trust it is the question. A little detail:
folding of all atom models of protein require calculations on the order of millenia, that is 1,000s of CPU hrs per fold. This gives you a time scale in the microsec to millisec range (all that for 1x10^-6 to 1x10^-3 seconds worth of data). It can take as little as 3 yrs. of CPU hrs to get 1 microsec. of data for a "good" protein. The other problem is scalability: there is none. That is equivalent to saying that 60 students cannot complete a 1 hour test by all working for one minute. There is loss in communication, etc.
Real life folding experiments take on the order of micro- to mili- sec as well. So, to be useful (that is to answer questions in the real world) computation must follow suit. Why id there this time frame issue? Well, in terms of real world events, if a folding rate takes 10 microsecs, there are some molecules which will fold in 10 nanosecs and some which take millisecs, depending on the pathway (remember physics tells us that energy is independent of the pathway it is arrived at). So, there is a stat distribution of possibilities around that event (are you asleep yet? No? Good!). This illustrates the difficulty of the project. Too short, and you don't sample all the energy space available to a protein, too long of intervals during the expt means you miss critical short time data (like taking a movie and removing too many frames, it is jerky. Imagine yodas fight scene in starwars 2 viewing only every tenth frame).
So what can we do now? The bottom line: here is what you guys are doing now:
120,000 CPUs are churning out 75 teraflops of SUSTAINed performance (not bursts) on any given day. That is the same as $100 million dollars worth of supercomputers. So does distributed computing work? You betcha!!
On the science end, they are trying to move toward eliminating some of the weaknesses and cutting down calc times. Looking at why some proteins fail to fold, Pande said that some fold very slowly, so the sims are not long enough. Also, intermediates are available. They are low evergy structures which are not necessarily fully folded proteins. You get stuck on these. That is not too hard to sort out, and they are working on it. What is harder is solvent effects. Water plays a crucial role often, but eats up a lot of comp time. Sometimes it can be left out.
There was a lot in his talk that was sort of over my head, so I won't go into many other details. I really just wanted to let you guys know what is going on on the OTHER side of the screen. How F@H is a great project and what prestige it has as a trailblazing idea. Fold with confidence that you are helping produce computing that is more powerful then just about any supercomputer on earth!