![]() |
Type of Document Dissertation Author Brenner, Paul Raymond Author's Email Address pbrenne1@nd.edu URN etd-07102007-123722 Title Parallel Algorithms and Distributed Systems for Computational Biophysics Degree Doctor of Philosophy Department Computer Science and Engineering Advisory Committee
Advisor Name Title Jonathan Sapirstein Committee Chair Aaron Striegel Committee Member Doug Thain Committee Member Jeff Peng Committee Member Jesus Izaguirre Committee Member Keywords
- distributed systems
- replica exchange
- algorithms
- computational biophysics
Date of Defense 2007-07-06 Availability restricted Abstract The understanding of atomic scale biomolecular function is a key componentin the prevention and treatment of disease. Computational biophysics has proven
essential in this regard, accelerating the development and analysis of new biomolecular
theories. The effective contribution of biophysical simulation is limited by the
computational complexity of the existing models. In this work new computationally
efficient parallel algorithms and distributed system frameworks are developed
to extend the capability of biophysical simulation. In tandem to this development,
I present the simulation and analysis of a target protein domain linked to cancer,
Huntington disease, and Alzheimer disease.
The Replica Exchange Method is a popular biomolecular sampling algorithm
that utilizes multiple simulations (replicas), to more rapidly overcome energy landscape
boundaries and accelerate sampling. The method has limitations in scale
related to the size of the biomolecular system and required number of replicas. I
introduce a novel all pairs exchange implementation of the algorithm that provides
asymptotically four fold speedup of conformation traversal for replica counts of 8
and larger with typical exchange rates. Experimental tests with the blocked alanine
dipeptide show a 100% sampling improvement according to potential energy averages and an ergodic measure. The cluster sampling rate for a target protein
domain was nearly twice that of the single exchange near neighbor method.
The method meets the detailed balance criterion for Monte Carlo methods and
introduces no new parameterizations, biases, or heuristics.
The development of distributed systems for scientific computation is an active
research field propelled by the growing number of research projects relying
on computationally complex simulations as part of the discovery process. Many
proposed frameworks have been successfully matched with unique applications to
provide the computational capacity required. Only recently, has more focus been
targeted toward the efficient management of the distributed data. I introduce a
‘processing in network storage’ distributed system framework that efficiently couples
computation with data management over heterogeneous, autonomous, and
distributed resources. The framework provides a fault tolerant, scalable, and
bandwidth conserving approach through the utilization of existing grid software
utilities and a new hybrid database/filesystem developed with our collaborators.
The performance is evaluated during the generation of 500 biomolecular simulations
producing over 1 million output files distributed over volunteer resources.
The correlation of atomic scale simulations with existing experimental techniques
provides complementary data sets that cross validate and more thoroughly
map biomolecular motion of interest. This correlation however is complicated
by the often disjoint nature of the observables accessible from simulation and
experiment. In this work, biophysical simulations of the isomerase PIN1 WW
domain reveal insight into promising reaction coordinates to help map simulation
observed recognition loop motion to experimental nuclear magnetic resonance
(NMR) results. Post processing analysis methods and metrics including dihedral distributions, conformational clustering, hydrogen bond determination, and committor
probability calculations indicate that the observed motion of the arginine
12 residue is coupled to the multivariate conformational changes of the recognition
loop.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access BrennerP072007.pdf 2.85 Mb 00:13:12 00:06:47 00:05:56 00:02:58 00:00:15 indicates that a file or directory is accessible from the campus network only.