![]() |
Type of Document Dissertation Author Rodrigues, Arun Francis Author's Email Address arodrig6@nd.edu URN etd-05102006-124649 Title Programming Future Architectures: Dusty Decks, Memory Walls, and the Speed of Light Degree Doctor of Philosophy Department Computer Science and Engineering Advisory Committee
Advisor Name Title Peter Kogge Committee Chair Erik DeBenedictis Committee Member Jay Brockman Committee Member Sharon Hu Committee Member Keywords
- programming model
Date of Defense 2006-04-18 Availability unrestricted Abstract Due to advances in CMOS fabrication technology, high performancecomputing capabilities have continually grown. More capable hardware
has allowed a range of complex scientific applications to be
developed. However, these applications present a bottleneck to future
performance. Entrenched 'legacy' codes - ``Dusty Decks' - demand
that new hardware must remain compatible with existing software.
Additionally, conventional architectures faces increasing challenges.
Many of these challenges revolve around the growing disparity between
processor and memory speed - the ``Memory Wall' - and
difficulties scaling to large numbers of parallel processors.
To a large extent, these limitations are inherent to the traditional
computer architecture. As data is consumed more quickly, moving that
data to the point of computation becomes more difficult. Barring any
upward revision in the speed of light, this will continue to be a
fundamental limitation on the speed of computation.
This work focuses on these solving these problems in the context of
Light Weight Processing (LWP). LWP is an innovative technique which
combines Processing-In-Memory, short vector computation,
multithreading, and extended memory semantics. It applies these
techniques to try and answer the questions ``What will a
next-generation supercomputer look like?' and ``How will we program
it?'
To that end, this work presents four contributions:
- An implementation of MPI which uses features of LWP to substantially
improve message processing throughput
- A technique leveraging extended memory semantics to improve message
passing by overlapping computation and communication
- An OpenMP library modified to allow efficient partitioning of
threads between a conventional CPU and LWPs - greatly improving
cost / performance.
- An algorithm to extract very small ``threadlets' which can overcome
the inherent disadvantages of a simple processor pipeline.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access RodriguesA052006.pdf 1.11 Mb 00:05:09 00:02:39 00:02:19 00:01:09 00:00:05